Page MenuHomePhabricator

DiscussionTools can overload the database if DiscussionToolsEnablePermalinksBackend is enabled without running the persistRevisionThreadItems maintenance script
Closed, ResolvedPublic

Description

DiscussionTools can overload the database if DiscussionToolsEnablePermalinksBackend is enabled without running the persistRevisionThreadItems maintenance script.

Incident documentation: https://docs.google.com/document/d/1_8qo5HISfCez497dDwGiQc6LJaFYiahZvnL4hjLlceA/edit (non-public)

Context:
The DiscussionToolsEnablePermalinksBackend config setting enables permanent links to comments (e.g. https://www.mediawiki.org/wiki/Special:GoToComment/c-PPelberg_(WMF)-20230215001000). This is implemented using database tables that track where a comment has been posted and allow finding it if it was archived.

To support this, DiscussionTools updates this information as a part of refreshlinks jobs. Under normal circumstances these updates are small (just recording the comments that have been added or removed on a page since the last edit or template refresh). However, right after the feature is enabled, the relevant database tables are empty; any refreshlinks job will cause the information about all comments on the page to be generated. The information is instead generated using the persistRevisionThreadItems maintenance script.

This feature was successfully released to group0 and group1 wikis earlier, and the maintenance script completed; on 21 March we released it to group2 wikis (T315353#8716239); the maintenance script was only started for some of the wikis, due to other maintenance scripts already being in progress (T315510#8716277). It would have taken several weeks to run anyway.

Incident:
As it happens, this week we also released a completely unrelated DiscussionTools feature that allows hiding parts of the interface in archived discussions (T249293). To take advantage of it, several large wikis added the new markup to templates used on huge numbers of archived talk pages:

This effectively caused a big chunk of the planned maintenance to be executed, without any of the normal safeguards. The cause of the database problems was only investigated after the third one. As a result the feature has been disabled on group2 wikis again (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/906593, https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/906600).

Related Objects

Event Timeline

Thanks, I think one rather hard way to do it is to enable it to small set of wikis, run the script, once it's finished, enable write on another set of wikis, and so on.

@Ladsgroup and I discussed this today.

The new plan to re-enable the feature without repeating this incident is: run the maintenance script first, then enable the config option, then run the maintenance script again on just the changes that will have occurred between these two steps and thus wouldn't have been processed.

Some small code changes will be needed to allow this. I'll keep the task open until they are implemented.

Change 908906 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Allow maintenance script to work even when DiscussionToolsEnablePermalinksBackend is off

https://gerrit.wikimedia.org/r/908906

Change 908907 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Allow maintenance script to only process pages touched in a time period

https://gerrit.wikimedia.org/r/908907

Change 908906 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Allow maintenance script to work even when DiscussionToolsEnablePermalinksBackend is off

https://gerrit.wikimedia.org/r/908906

Change 908907 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Allow maintenance script to only process pages touched in a time period

https://gerrit.wikimedia.org/r/908907