DiscussionTools can overload the database if DiscussionToolsEnablePermalinksBackend is enabled without running the persistRevisionThreadItems maintenance script.
Incident documentation: https://docs.google.com/document/d/1_8qo5HISfCez497dDwGiQc6LJaFYiahZvnL4hjLlceA/edit (non-public)
Context:
The DiscussionToolsEnablePermalinksBackend config setting enables permanent links to comments (e.g. https://www.mediawiki.org/wiki/Special:GoToComment/c-PPelberg_(WMF)-20230215001000). This is implemented using database tables that track where a comment has been posted and allow finding it if it was archived.
To support this, DiscussionTools updates this information as a part of refreshlinks jobs. Under normal circumstances these updates are small (just recording the comments that have been added or removed on a page since the last edit or template refresh). However, right after the feature is enabled, the relevant database tables are empty; any refreshlinks job will cause the information about all comments on the page to be generated. The information is instead generated using the persistRevisionThreadItems maintenance script.
This feature was successfully released to group0 and group1 wikis earlier, and the maintenance script completed; on 21 March we released it to group2 wikis (T315353#8716239); the maintenance script was only started for some of the wikis, due to other maintenance scripts already being in progress (T315510#8716277). It would have taken several weeks to run anyway.
Incident:
As it happens, this week we also released a completely unrelated DiscussionTools feature that allows hiding parts of the interface in archived discussions (T249293). To take advantage of it, several large wikis added the new markup to templates used on huge numbers of archived talk pages:
- ruwiki on 4 April (T334023)
- dewiki on 6 April (morning) (T334195)
- enwiki on 6 April (afternoon) (https://docs.google.com/document/d/1_8qo5HISfCez497dDwGiQc6LJaFYiahZvnL4hjLlceA/edit)
This effectively caused a big chunk of the planned maintenance to be executed, without any of the normal safeguards. The cause of the database problems was only investigated after the third one. As a result the feature has been disabled on group2 wikis again (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/906593, https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/906600).