Page MenuHomePhabricator

Add option to also delete associated talk pages in Special:Nuke
Closed, ResolvedPublic5 Estimated Story Points

Description

User story: As a Wikipedia administrator, I want to delete all talk pages associated with pages I'm deleting in Nuke, so that I don't need to delete them manually and individually afterwards.

When deleting pages with Special:Nuke, only the pages created by the targeted user are deleted. This extends to not deleting talk pages for deleted pages. In almost all cases, when a page is deleted, its associated talk page should also be deleted. This generates additional manual work for administrators. We could instead provide an option to delete associated talk pages when running Nuke, regardless of who actually created the talk page.

To do this we can add a simple checkbox to the page listing form with the text "Delete associated talk pages". The checkbox should be unselected by default. This should be the final checkbox in the filter box.

If the checkbox is selected, Nuke should additionally list discussion pages for each page retrieved, if they exist. We should opt to present a page and its discussion page over additional pages. As an example, say a user created articles A, B, and C, each has a talk page created by a different user, and the limit of pages Nuke can retrieve is 4. We should retrieve A, Talk:A, B, and Talk:B, rather than A, B, C, and Talk:A.

The edit summary for the talk page deletions should prepend Delete-talk-summary-prefix to the reason provided by the user to clarify the action being taken, in the same way as for individual page deletions.

If one of the pages to be deleted is a discussion page already, we should not attempt to delete the page related to it.

Proposed design

Listing pages OOUI.png (1×1 px, 151 KB)

Original description
I would like to request an addition to Special:Nuke which will add a checkbox to permit deletion of associated talk pages for any pages being nuked. There was recently a user on en.wiki which required a Nuke per policy, but their 100+ pages also had talk pages which were rendered irrelevant after the Nuke and had to be removed manually.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@jsn.sherman

A simple approach would be to check for/fetch a talk page for each page in doDelete() and also check to make sure it's not already in the list. However, a naive implementation would not respect $limit.

Does $limit here refer to the field where users decide how many pages to retrieve? In which sense the concern, as I understand it, would be that a user enters '100' pages to retrieve, and then we attempt to delete 200 because we've ~doubled our list?

Screenshot 2024-10-08 at 12.47.08.png (138×660 px, 17 KB)

Is there a technical concern here, or is this about user expectations?

Going with the second approach (or something like it) is a better design because it means that all pages to be deleted will be in the $pages array and will be handled identically by doDelete(). If we simply respect $limit, we don't have to worry about the implications of breaking it ;)

This implies we would need to have the checkbox for this feature on the first view, rather than the second, since it affects how many pages we list, is that right?

A second thought - we are considering two other features that extend the number of pages deleted from the naive list - T46318: Add an option to Nuke to also delete subpages of deleted pages and T364222: Add an option to Nuke to also delete redirects pointing to deleted pages. Imagining that we implement all three of these and a user selects all three (likely) - respecting $limit may result in a very low number of actual pages that can be deleted. e.g. imagining that a page has 1 talk page, 2 subpages, and 7 redirects, at a rate of 10:1 a limit of 500 means we can only delete 50 articles and their associated pages.

I understand the concern and don't want to assume there's an easy answer so is it worth perhaps at this first step worrying about the implications of breaking $limit, given that we potentially have these other features in the future?

@jsn.sherman

A simple approach would be to check for/fetch a talk page for each page in doDelete() and also check to make sure it's not already in the list. However, a naive implementation would not respect $limit.

Does $limit here refer to the field where users decide how many pages to retrieve? In which sense the concern, as I understand it, would be that a user enters '100' pages to retrieve, and then we attempt to delete 200 because we've ~doubled our list?

Screenshot 2024-10-08 at 12.47.08.png (138×660 px, 17 KB)

Is there a technical concern here, or is this about user expectations?

Going with the second approach (or something like it) is a better design because it means that all pages to be deleted will be in the $pages array and will be handled identically by doDelete(). If we simply respect $limit, we don't have to worry about the implications of breaking it ;)

This implies we would need to have the checkbox for this feature on the first view, rather than the second, since it affects how many pages we list, is that right?

correct

A second thought - we are considering two other features that extend the number of pages deleted from the naive list - T46318: Add an option to Nuke to also delete subpages of deleted pages and T364222: Add an option to Nuke to also delete redirects pointing to deleted pages. Imagining that we implement all three of these and a user selects all three (likely) - respecting $limit may result in a very low number of actual pages that can be deleted. e.g. imagining that a page has 1 talk page, 2 subpages, and 7 redirects, at a rate of 10:1 a limit of 500 means we can only delete 50 articles and their associated pages.

The answer here would be to increase the limit after running tests and determining that it is fine to do so.

I understand the concern and don't want to assume there's an easy answer so is it worth perhaps at this first step worrying about the implications of breaking $limit, given that we potentially have these other features in the future?

Yes, we should respect $limit.

A couple of things we discussed about this today:

  • For talk pages the above issue only maximally halves the number of pages it's possible to delete (500->250) if every page has a talk page.
  • We should make a mockup of the new flow with this feature, because it should be on the first form, so that admins can see the full list of pages they're going to delete before doing so. The text on the button should also make clear that this is about discussion pages associated with pages that aren't already discussion pages. This might be obvious to admins already, so if there's no clear way to achieve this without writing a lot of words, I think it's OK to keep it shorter.

I just wrote the following example, which I'll get feedback on, but feels like the better of the two options to me. In this way, the admin can simply re-run Nuke later and still delete all the pages they want to. If we went for the second option, talk pages would be left hanging and wouldn't be picked up on a 2nd run.

We should opt to present a page and its discussion page over additional pages. As an example, say a user created articles A, B, and C, each has a talk page created by a different user, and the limit of pages Nuke can retrieve is 4. We should retrieve A, Talk:A, B, and Talk:B, rather than A, B, C, and Talk:A.

But what if the page limit is 5 in this case? Do we present A, Talk:A, B, Talk:B, and C, or do we truncate the list early and leave C, so that a re-run of Nuke could delete it and its talk page, rather than (as implied above) leaving it hanging and not deleted?

Hi! Here is a proposed design of how to display the talk, sub, and redirect pages.

  • In the filters, there is an option to delete associated pages that might not be listed in the list. Label "In addition to the selected pages, include their associated pages for deletion: " with 3 checkboxes:
    • Talk pages
    • Redirects
    • Subpages

In the list:

  • Sub, talk, and redirect pages are in italics.
  • The sub, talk redirect pages are presented underneath the associated page and indented.
  • Redirects have “redirect:[pagename]” added at the end of the page.
Design
Listing pages OOUI.png (1×1 px, 151 KB)
DMburugu triaged this task as Medium priority.Jan 21 2025, 5:14 PM
Chlod changed the task status from Open to In Progress.Jan 23 2025, 8:45 AM
Chlod moved this task from Ready to In Progress on the Moderator-Tools-Team (Kanban) board.

question that came up: are talk subpages a thing we need to handle?

Noting here some resolutions from the check-in:

  • We're leaving subpages for later, since that seems like its own beast to deal with. The probable approach right now involves searching titles, which we've had issues with before. This probably includes subpages of talk pages. So the actionable tasks for me in the time remaining would be to get talk and redirect done.
  • We'll allow the admin to delete talk/redirect/subpages even if the main page was not selected for deletion.
  • When dealing with issues of limits: if one page happen to has too many dependent pages (talk or redirect), just remove it entirely from the list and show the user a warning, much like we do when there's no results returned. This would be a prompt for the deleting admin to re-run the query and see what pages were "too big" to include. This means we always respect the limit, and in some cases, go way under it with the goal of avoiding that limit.
    • (not discussed in the check-in, but) This behavior breaks down if a page has over 499 redirects (or 498 redirects and a talk page). This feels like an edge case though, so that's probably something for later too.

With that behavior established, engineering-wise, the approach I'm thinking of would probably would be to:

  • (1 query) Get the initial list of pages, much like we do before, limiting up to 500 rows.
  • (1 query) Check if those pages have talk pages. This is inherently limited to 500 rows, since pages only have one talk page (if we completely ignore talk subpages for now).
  • (??? queries 1 query) Check if each of those pages have redirects. This one, I'm not so sure about, since the redirect table only includes the target namespace and page name. My mind tells me to run a query that just has all those pages OR'd together in the condition ((rd_namespace = 0 AND rd_title = "Foo") OR (rd_namespace = 0 AND rd_title = "Bar") OR ...) so that it's just one query, but it would be one big query. The alternative is making up to 500 queries, which is less optimal from my point of understanding.

Change #1114326 had a related patch set uploaded (by Chlod Alejandro; author: Chlod Alejandro):

[mediawiki/extensions/Nuke@master] Allow deletion of associated pages

https://gerrit.wikimedia.org/r/1114326

I have been reviewing this patch and I have a functionality question: if a page was created more than $wgNukeMaxAge ago (it won't show in the list), but the talk page was created later (within the range of $wgNukeMaxAge), should the talk page show up? I don't have a strong opinion, but I think it shouldn't. I'd love to hear what everybody has to say about it!

I think as long as the talk page was created by the targeted user and it's within $wgNukeMaxAge, it should still probably be added in. Otherwise, all of our talk-based namespace filters would be limited by their subject pages, and administrators also won't be able to clean up on a user mass-creating bad pages in the talk namespaces.

Hello, just reviewing this patch and ran into a couple of UI issues.

The first one is that the nested talk page is presented with the correct indent, but no italics.
The second one is I can't seem to get redirects to show up underneath the associated page, though it is italicizing the redirects.

See screenshot below (Boxes 2 is a redirect ot Boxes page):

Screenshot 2025-01-28 at 10.17.55 AM.png (117×484 px, 27 KB)

I'm also seeing the same as Katy, plus one more observation.

When there are multiple redirects, the original page does not appear indented.

Screenshot 2025-01-28 at 11.27.44.png (81×440 px, 24 KB)

In this case, Redirecting was the original page name, Redirector was the first redirect, and Redirectception was the final page name. The original page name only appears when you filter for the user who created it; it doesn't appear when you search without filters.

I also see that the talk pages added by the same user that created the mainspace page don't appear indented. When a different user creates a talk page, it does appear indented.

Screenshot 2025-01-28 at 11.28.00.png (125×390 px, 35 KB)

Screenshot 2025-01-28 at 11.27.50.png (52×355 px, 14 KB)

The first one is that the nested talk page is presented with the correct indent, but no italics.

Only redirects show up with italics. Italicization of redirects predates this patch (T278625), so I did not change its current behavior. Italicizing all associated pages was not mentioned in this task's description; should I change the patch to do that anyway?

The second one is I can't seem to get redirects to show up underneath the associated page, though it is italicizing the redirects.

Pages created by the same user which also happen to be an associated page show up as actual items (not indented). Based on the feedback, this should probably not be the case. I'll update the patch accordingly.

I'm also seeing the same as Katy, plus one more observation.

When there are multiple redirects, the original page does not appear indented.

Screenshot 2025-01-28 at 11.27.44.png (81×440 px, 24 KB)

In this case, Redirecting was the original page name, Redirector was the first redirect, and Redirectception was the final page name. The original page name only appears when you filter for the user who created it; it doesn't appear when you search without filters.

I'm assuming this is a double-redirect case where RedirectingRedirectorRedirectception. Supporting this behavior is quite complex; we'd have to make $pageGroups an n-dimensional array instead of its current two-dimensional structure, then recursively request redirects of redirects, which takes more queries. If this is something we want to support, this has to have limits and, ideally, $pageGroups should use a different data structure instead of an array tuple, but unfortunately this would break the behavior of onNukeGetNewPages which would then require going through the deprecation process.

I also see that the talk pages added by the same user that created the mainspace page don't appear indented. When a different user creates a talk page, it does appear indented.

This one is a consequence of pages created by the same user being actual items instead of an associated page, so this should be fixed by not doing that.

The first one is that the nested talk page is presented with the correct indent, but no italics.

Only redirects show up with italics. Italicization of redirects predates this patch (T278625), so I did not change its current behavior. Italicizing all associated pages was not mentioned in this task's description; should I change the patch to do that anyway?

It looks like Olga added a comment with designs, @OTichonova is this the correct design we should be implementing?

Hi! Here is a proposed design of how to display the talk, sub, and redirect pages.

  • In the filters, there is an option to delete associated pages that might not be listed in the list. Label "In addition to the selected pages, include their associated pages for deletion: " with 3 checkboxes:
    • Talk pages
    • Redirects
    • Subpages

In the list:

  • Sub, talk, and redirect pages are in italics.
  • The sub, talk redirect pages are presented underneath the associated page and indented.
  • Redirects have “redirect:[pagename]” added at the end of the page.
Design
Listing pages OOUI.png (1×1 px, 151 KB)

Ah... looks like I completely missed that... 🙃 Sorry about that! Uploading a new patchset soon.

Change #1114326 had a related patch set uploaded (by Chlod Alejandro; author: Chlod Alejandro):

[mediawiki/extensions/Nuke@master] Allow deletion of associated pages

https://gerrit.wikimedia.org/r/1114326

suggest building this behind a feature flag

Was re-reviewing this and I noticed while testing that the redirects aren't appearing nested under the the page they are redirecting to.

See screenshot below:

Screenshot 2025-01-30 at 9.38.43 AM.png (136×614 px, 42 KB)

Was re-reviewing this and I noticed while testing that the redirects aren't appearing nested under the the page they are redirecting to.

See screenshot below:

Screenshot 2025-01-30 at 9.38.43 AM.png (136×614 px, 42 KB)

Can you include the query/filter settings for this? This doesn't seem to be happening on my end.

Ah, I see this happens when "redirects" is not checked. Should redirect pages be grouped anyway even though the user never asked for them?

Also, I've introduced the feature flag mentioned by Jason; the checkboxes need the ?nukeAP=true query parameter to work.

If it helps, this is what I see when I don't filter by user and check both talk page and redirect checkboxes:

Screenshot 2025-01-30 at 11.08.03.png (100×592 px, 39 KB)

This is what I see when I filter by user and check both checkboxes:

If it helps, this is what I see when I don't filter by user and check both talk page and redirect checkboxes:

Screenshot 2025-01-30 at 11.08.03.png (100×592 px, 39 KB)

This is what I see when I filter by user and check both checkboxes:

Screenshot 2025-01-30 at 11.08.03.png (100×592 px, 39 KB)

This appears to be the same screenshot; can you please confirm?

Either way, double redirects still currently are unsupported since the page group array still has maximum depth of 1. Going further would mean more friction with the onNukeGetNewPages hook (and the current patch is already rather rough on it; it now calls the hook for every page group to ensure pages get grouped correctly), which I'd prefer to avoid.

If it helps, this is what I see when I don't filter by user and check both talk page and redirect checkboxes:

Screenshot 2025-01-30 at 11.08.03.png (100×592 px, 39 KB)

This is what I see when I filter by user and check both checkboxes:

Screenshot 2025-01-30 at 11.08.03.png (100×592 px, 39 KB)

This appears to be the same screenshot; can you please confirm?

Either way, double redirects still currently are unsupported since the page group array still has maximum depth of 1. Going further would mean more friction with the onNukeGetNewPages hook (and the current patch is already rather rough on it; it now calls the hook for every page group to ensure pages get grouped correctly), which I'd prefer to avoid.

Oops, here's the correct screenshot from case 1:

Screenshot 2025-01-30 at 18.19.20.png (76×597 px, 28 KB)

I am getting an error when I try to get results with the following data:

Screenshot 2025-02-03 at 19.07.30.png (580×966 px, 65 KB)

The error:

Screenshot 2025-02-03 at 19.07.35.png (504×991 px, 169 KB)

Once that is fixed, I think we can move on and merge this patch

Hello, just posting what I'm seeing locally when I run the following flow:

  • Create normal pages called "Page 1", "Page 2", "Page 3"
  • Create at least 1 redirect page pointing to Page 3. (Page 4)
  • Run a query with the pattern Page%, redirects included, and the limit set to 3.
  • Page 1 and Page 2 should be included; Page 3 should be included and the error should show up because including it (+2 pages, the main page and its redirect) would run the list over the user-defined limit (3).

    (I'm seeing pages 2 & 3 show up with the redirect(Page4) to page 3, but no page 1 and no error message)

Screenshot 2025-02-05 at 8.11.38 AM.png (1×1 px, 154 KB)

@Kgraessle On second thought, my steps provided were flawed. That's on me.

Page 3 will always show its redirects since it's the most recently-made page (the ordering scheme used by the main query), which means it gets processed and added into the page groups array first. If you change the target of Page 4 to Page 1, then this should trigger the notice, as Page 1 gets added into the array last (since it's the oldest page).

@Kgraessle On second thought, my steps provided were flawed. That's on me.

Page 3 will always show its redirects since it's the most recently-made page (the ordering scheme used by the main query), which means it gets processed and added into the page groups array first. If you change the target of Page 4 to Page 1, then this should trigger the notice, as Page 1 gets added into the array last (since it's the oldest page).

Ok, I modified Page 4 to redirect to Page 1 and I'm still not seeing the notice.
I'm also not seeing Page 1 showing up, see screenshot here:

Screenshot 2025-02-05 at 10.33.19 AM.png (911×994 px, 119 KB)

Let me know if there's another flow you want me to test out.

I wonder if that's because Page4 is directly matching the filter above - what happens if you rename it to Redirect1?

Test wiki created on Patch demo by SCardenas (WMF) using patch(es) linked to this task:
http://patchdemo.wmcloud.org/wikis/cf6aefe877/w/

@Samwalton9-WMF
We tested this on patch demo and it is not happening there, so we merged it. We'll test on testwiki next week.

Change #1114326 merged by jenkins-bot:

[mediawiki/extensions/Nuke@master] Allow deletion of associated pages

https://gerrit.wikimedia.org/r/1114326

Test wiki on Patch demo by SCardenas (WMF) using patch(es) linked to this task was deleted:

http://patchdemo.wmcloud.org/wikis/cf6aefe877/w/

I wonder if that's because Page4 is directly matching the filter above - what happens if you rename it to Redirect1?

That's also a good catch. Page4 matching the filter will make it first to be processed. There's some deduplication that happens while associated pages get added, which will cause it to indent under Page1 if it's on the list. But if Page1 is not in the list at all, it will just show up as a normal page.

In any case, this is covered by the assertions on this line. Perhaps I should have just recommended a flow similar to what's on that test to begin with. 😅

Why is it "redirect:[[$1]]" and not something more naturally readable, such as "redirect to [[$1]]"?

Why is it "redirect:[[$1]]" and not something more naturally readable, such as "redirect to [[$1]]"?

This seems like a reasonable change, what do you think @OTichonova?

This seems like a reasonable change, what do you think @OTichonova?

@Samwalton9-WMF Sounds good to me.

we verified the behavior on testwiki with the feature flag:
https://test.wikipedia.org/wiki/Special:Nuke?nukeAP=true

@Samwalton9-WMF could you check this on enwiki once it rolls out to all groups? We just want to be totally sure about it before we remove the feature flag.

This looks to be working well on enwiki. The only issue I ran into is presumably related to the feature flag - if I didn't select either redirects or talk pages as a filter, they disappeared after clicking List Pages, I assume because the query parameter disappeared.

Kgraessle changed the task status from In Progress to Open.Feb 25 2025, 3:38 PM

Moving back to ready to remove feature flag

Change #1122648 had a related patch set uploaded (by Scardenasmolinar; author: Scardenasmolinar):

[mediawiki/extensions/Nuke@master] Remove feature flag that hid talk pages and redirect options

https://gerrit.wikimedia.org/r/1122648

Scardenasmolinar changed the task status from Open to In Progress.Feb 25 2025, 7:49 PM

A patch for the feature flag removal has been uploaded!

Change #1122648 merged by jenkins-bot:

[mediawiki/extensions/Nuke@master] Remove feature flag that hid talk pages and redirect options

https://gerrit.wikimedia.org/r/1122648

Draft for Tech News:

When listing pages for deletion with the Nuke tool, administrators can now also list associated talk pages and redirects for deletion, alongside pages created by the target, rather than needing to manually delete these pages afterwards.

verified on testwiki; leaving open until after train rollout is complete