Page MenuHomePhabricator

Slow query in ApiQueryBlocks with new schema
Closed, ResolvedPublicPRODUCTION ERROR

Description

After I switched all wikis to block_target read-new mode, I saw this error. So I immediately reverted the change.

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBQueryDisconnectedError: A connection error occurred during a query. 
Query: SELECT  bt_auto,bl_id,bl_timestamp,bt_address,bt_user,COALESCE(bt_address, bt_user_text) AS `bt_address_or_user_name`,actor_user,acto
exception.trace
from /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/database/Database.php(1201)
#0 /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/database/Database.php(1187): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/database/Database.php(1161): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/database/Database.php(652): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/database/Database.php(1350): Wikimedia\Rdbms\Database->query(Wikimedia\Rdbms\Query, string)
#4 /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/database/DBConnRef.php(119): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#5 /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/database/DBConnRef.php(351): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#6 /srv/mediawiki/php-1.42.0-wmf.21/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(723): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#7 /srv/mediawiki/php-1.42.0-wmf.21/includes/api/ApiQueryBase.php(415): Wikimedia\Rdbms\SelectQueryBuilder->fetchResultSet()
#8 /srv/mediawiki/php-1.42.0-wmf.21/includes/api/ApiQueryBlocks.php(284): ApiQueryBase->select(string)
#9 /srv/mediawiki/php-1.42.0-wmf.21/includes/api/ApiQuery.php(704): ApiQueryBlocks->execute()
#10 /srv/mediawiki/php-1.42.0-wmf.21/includes/api/ApiMain.php(1946): ApiQuery->execute()
#11 /srv/mediawiki/php-1.42.0-wmf.21/includes/api/ApiMain.php(922): ApiMain->executeAction()
#12 /srv/mediawiki/php-1.42.0-wmf.21/includes/api/ApiMain.php(893): ApiMain->executeActionWithErrorHandling()
#13 /srv/mediawiki/php-1.42.0-wmf.21/includes/api/ApiEntryPoint.php(158): ApiMain->execute()
#14 /srv/mediawiki/php-1.42.0-wmf.21/includes/MediaWikiEntryPoint.php(199): MediaWiki\Api\ApiEntryPoint->execute()
#15 /srv/mediawiki/php-1.42.0-wmf.21/api.php(44): MediaWiki\MediaWikiEntryPoint->run()
#16 /srv/mediawiki/w/api.php(3): require(string)
#17 {main}
Impact
  • 12 exceptions before I finished reverting it
Notes
MariaDB [enwiki]> explain SELECT  bt_auto,bl_id,bl_timestamp,bt_address,bt_user,COALESCE(bt_address, bt_user_text) AS `bt_address_or_user_name`,actor_user,actor_name,bl_expiry,bl_sitewide,comment_bl_reason.comment_text AS `bl_reason_text`,comment_bl_reason.comment_data AS `bl_reason_data`,comment_bl_reason.comment_id AS `bl_reason_cid`  FROM `block` JOIN `block_target` ON ((bt_id=bl_target)) LEFT JOIN `user` `block_target_user` ON ((user_id=bt_user)) JOIN `actor` ON ((actor_id=bl_by_actor)) JOIN `comment` `comment_bl_reason` ON ((comment_bl_reason.comment_id = bl_reason_id))   WHERE (bt_address = '2601:(omitted for privacy)' OR block_target_user.user_name IN ('EvergreenFir','Yapperbot','Dekimasu','GreenMeansGo','だ*ぜ','AGK','Ameisenigel','Bennylin','Daniuu','Doǵu','Emufarmers','Faendalimas','MdsShakil','Minorax','Nehaoua','Renvoy','RoySmith','Vermont','MediaWiki message delivery','Fyrael','Firefangledfeathers','Naypta','Lau737','HandThatFeeds') ) AND bt_auto = 0 AND (NOT EXISTS (SELECT  1  FROM `block_target` `hu_block_target` JOIN `block` ON ((bl_target=hu_block_target.bt_id))   WHERE (hu_block_target.bt_user=block_target.bt_user) AND bl_deleted = 1  )) AND (bl_expiry > '20240314040425')  ORDER BY bl_timestamp DESC,bl_id DESC LIMIT 101;
+------+--------------+-------------------+--------+---------------------+--------------+---------+------------------------------+--------+--------------------------+
| id   | select_type  | table             | type   | possible_keys       | key          | key_len | ref                          | rows   | Extra                    |
+------+--------------+-------------------+--------+---------------------+--------------+---------+------------------------------+--------+--------------------------+
|    1 | PRIMARY      | block             | index  | bl_target,bl_expiry | bl_timestamp | 14      | NULL                         | 202    | Using where              |
|    1 | PRIMARY      | block_target      | eq_ref | PRIMARY,bt_address  | PRIMARY      | 4       | enwiki.block.bl_target       | 1      | Using where              |
|    1 | PRIMARY      | block_target_user | eq_ref | PRIMARY             | PRIMARY      | 4       | enwiki.block_target.bt_user  | 1      | Using where              |
|    1 | PRIMARY      | actor             | eq_ref | PRIMARY             | PRIMARY      | 8       | enwiki.block.bl_by_actor     | 1      |                          |
|    1 | PRIMARY      | comment_bl_reason | eq_ref | PRIMARY             | PRIMARY      | 8       | enwiki.block.bl_reason_id    | 1      |                          |
|    2 | MATERIALIZED | hu_block_target   | range  | PRIMARY,bt_user     | bt_user      | 5       | NULL                         | 842380 | Using where; Using index |
|    2 | MATERIALIZED | block             | ref    | bl_target           | bl_target    | 4       | enwiki.hu_block_target.bt_id | 1      | Using where              |
+------+--------------+-------------------+--------+---------------------+--------------+---------+------------------------------+--------+--------------------------+

Related Objects

Event Timeline

At first glance, looking at the row counts, I thought it was the subquery. But in fact the problem is the OR. You can take the subquery out, and it's still slow, and note the choice of bl_timestamp when it's just trying to find a few targets. If you take out the OR, then it's fast. I tried forcing bl_target instead of bl_timestamp but it was still slow.

I reviewed all the patches linked from T346293, and I think the only other potentially affected query is the one in DatabaseBlockStore::newLoad().

I think the options are:

  • A union.
  • A union for the block_target query, then a separate query for block with bl_target IN(...). This is easier given the structure of ApiQueryBlocks.
  • Denormalize the username into the bt_address field so that it is like ipb_address, reverting what we thought was an improvement in the new schema.

I tried bl_target IN(... a union subquery...) but it was slow.

I tried using the combined index bt_ip_user_text (bt_ip_hex(35),bt_user_text) as follows:

SELECT bt_id FROM block_target WHERE (bt_user_text IS NULL AND bt_ip_hex='v6-...') OR (bt_ip_hex IS NULL AND bt_user_text IN('EvergreenFir','Yapperbot','Dekimasu','GreenMeansGo','だ*ぜ','AGK','Ameisenigel','Bennylin','Daniuu','Doǵu','Emufarmers','Faendalimas','MdsShakil','Minorax','Nehaoua','Renvoy','RoySmith','Vermont','MediaWiki message delivery','Fyrael','Firefangledfeathers','Naypta','Lau737','HandThatFeeds'));

but it was slow.

Change 1011207 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] block: In ApiQueryBlocks use a union when querying for both users and IPs

https://gerrit.wikimedia.org/r/1011207

I think the only other potentially affected query is the one in DatabaseBlockStore::newLoad().

I confirmed that DatabaseBlockStore::newLoad() does not suffer from this problem.

MariaDB [enwiki]> explain SELECT  bl_id,bt_address,bt_user,bt_user_text,bl_timestamp,bt_auto,bl_anon_only,bl_create_account,bl_enable_autoblock,bl_expiry,bl_deleted,bl_block_email,bl_allow_usertalk,bl_parent_block_id,bl_sitewide,bl_by_actor,block_by_actor.actor_user AS `bl_by`,block_by_actor.actor_name AS `bl_by_text`,comment_bl_reason.comment_text AS `bl_reason_text`,comment_bl_reason.comment_data AS `bl_reason_data`,comment_bl_reason.comment_id AS `bl_reason_cid`  FROM `block` JOIN `block_target` ON ((bt_id=bl_target)) JOIN `actor` `block_by_actor` ON ((actor_id=bl_by_actor)) JOIN `comment` `comment_bl_reason` ON ((comment_bl_reason.comment_id = bl_reason_id))   WHERE (bt_user = 1) OR (bt_ip_hex = '7F000001' OR ((bt_range_start  LIKE '7F00%' ESCAPE '`') AND (bt_range_start <= '7F000001') AND (bt_range_end >= '7F000001')));
+------+-------------+-------------------+-------------+------------------------------------------+----------------------------------+---------+---------------------------+------+-----------------------------------------------------------------+
| id   | select_type | table             | type        | possible_keys                            | key                              | key_len | ref                       | rows | Extra                                                           |
+------+-------------+-------------------+-------------+------------------------------------------+----------------------------------+---------+---------------------------+------+-----------------------------------------------------------------+
|    1 | SIMPLE      | block_target      | index_merge | PRIMARY,bt_ip_user_text,bt_range,bt_user | bt_user,bt_ip_user_text,bt_range | 5,38,76 | NULL                      | 3    | Using sort_union(bt_user,bt_ip_user_text,bt_range); Using where |
|    1 | SIMPLE      | block             | ref         | bl_target                                | bl_target                        | 4       | enwiki.block_target.bt_id | 1    |                                                                 |
|    1 | SIMPLE      | block_by_actor    | eq_ref      | PRIMARY                                  | PRIMARY                          | 8       | enwiki.block.bl_by_actor  | 1    |                                                                 |
|    1 | SIMPLE      | comment_bl_reason | eq_ref      | PRIMARY                                  | PRIMARY                          | 8       | enwiki.block.bl_reason_id | 1    |                                                                 |
+------+-------------+-------------------+-------------+------------------------------------------+----------------------------------+---------+---------------------------+------+-----------------------------------------------------------------+
4 rows in set (0.001 sec)

For whatever reason, the optimizer is able to convert OR to UNION for the newLoad() query. But in ApiQueryBlocks, it doesn't.

Change 1011207 merged by jenkins-bot:

[mediawiki/core@master] block: In ApiQueryBlocks use a union when querying for both users and IPs

https://gerrit.wikimedia.org/r/1011207

@tstarling If I make a request with a bkusers parameter which includes both usernames and IPs/ranges and the username, IPs and ranges are not blocked (or the username does not exist), I see this exception:

{
    "error": {
        "code": "internal_api_error_InvalidArgumentException",
        "info": "[a6349b72d293f0ea1b55adac] Exception caught: Wikimedia\\Rdbms\\Platform\\SQLPlatform::makeList: empty input for field bt_id",
        "errorclass": "InvalidArgumentException",
        "trace": "InvalidArgumentException at /var/www/html/w/includes/libs/rdbms/platform/SQLPlatform.php(245)\nfrom /var/www/html/w/includes/libs/rdbms/platform/SQLPlatform.php(245)\n#0 /var/www/html/w/includes/libs/rdbms/platform/SQLPlatform.php(678): Wikimedia\\Rdbms\\Platform\\SQLPlatform->makeList()\n#1 /var/www/html/w/includes/libs/rdbms/database/Database.php(3334): Wikimedia\\Rdbms\\Platform\\SQLPlatform->selectSQLText()\n#2 /var/www/html/w/includes/libs/rdbms/database/DatabaseMySQL.php(730): Wikimedia\\Rdbms\\Database->selectSQLText()\n#3 /var/www/html/w/includes/libs/rdbms/database/Database.php(1342): Wikimedia\\Rdbms\\DatabaseMySQL->selectSQLText()\n#4 /var/www/html/w/includes/libs/rdbms/database/DBConnRef.php(119): Wikimedia\\Rdbms\\Database->select()\n#5 /var/www/html/w/includes/libs/rdbms/database/DBConnRef.php(351): Wikimedia\\Rdbms\\DBConnRef->__call()\n#6 /var/www/html/w/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(724): Wikimedia\\Rdbms\\DBConnRef->select()\n#7 /var/www/html/w/includes/api/ApiQueryBase.php(415): Wikimedia\\Rdbms\\SelectQueryBuilder->fetchResultSet()\n#8 /var/www/html/w/includes/api/ApiQueryBlocks.php(300): ApiQueryBase->select()\n#9 /var/www/html/w/includes/api/ApiQuery.php(705): ApiQueryBlocks->execute()\n#10 /var/www/html/w/includes/api/ApiMain.php(1946): ApiQuery->execute()\n#11 /var/www/html/w/includes/api/ApiMain.php(922): ApiMain->executeAction()\n#12 /var/www/html/w/includes/api/ApiMain.php(893): ApiMain->executeActionWithErrorHandling()\n#13 /var/www/html/w/includes/api/ApiEntryPoint.php(158): ApiMain->execute()\n#14 /var/www/html/w/includes/MediaWikiEntryPoint.php(199): MediaWiki\\Api\\ApiEntryPoint->execute()\n#15 /var/www/html/w/api.php(44): MediaWiki\\MediaWikiEntryPoint->run()\n#16 {main}"
    },
    "servedby": "b6a42289990f"
}

I do not see this exception for commit bc4517566ac9dd7d339e929f7c6371b0d2a05314.

Example request:

{
	"action": "query",
	"format": "json",
	"list": "blocks",
	"formatversion": "2",
	"bkusers": "Admin|2.3.4.5"
}

Change 1012609 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] block: Fix exception in ApiQueryBlocks when specified users are not blocked

https://gerrit.wikimedia.org/r/1012609

Change 1012609 merged by jenkins-bot:

[mediawiki/core@master] block: Fix exception in ApiQueryBlocks when specified users are not blocked

https://gerrit.wikimedia.org/r/1012609

I used this script to call the ApiQueryBlocks API with a combination of different parameters (including bkusers, which I understand triggered this problem). I then extracted from the MW logs the line starting [rdbms] ApiQueryBlocks::execute which includes the query and the time the query took to run.

I repeated this with read-old, read-new before this patch and read-new after this patch. Below are the results comparing these three cases, grouped by the value of bkusers. Time is in milliseconds.

Read-old (each combination of parameters was repeated 10 times):

                     time                         
                      min  median      mean    max
bkusers                                           
1.2.3.4             0.239  0.3625  0.363588  0.616
1.2.3.4/28          0.225  0.3270  0.331680  0.581
1.2.3.4|1.2.3.4/28  0.395  0.4160  0.425100  0.468
1.2.3.4|temp        0.385  0.4150  0.416300  0.463
name                0.226  0.3490  0.342884  0.861
name|1.2.3.4        0.384  0.4510  0.457700  0.512
name|1.2.3.4/28     0.355  0.4160  0.438000  0.575
name|temp           0.349  0.3595  0.376700  0.439
temp                0.235  0.3425  0.343320  1.044
temp|1.2.3.4/28     0.348  0.4105  0.397900  0.438

Read-new before this patch (each combination of parameters was repeated 5 times):

                       time                              
                        min   median        mean      max
bkusers                                                  
1.2.3.4               0.286    1.096    1.039128    2.322
1.2.3.4/28            0.272    0.951    0.825968    1.703
1.2.3.4|1.2.3.4/28    1.098    1.441    1.442400    1.770
1.2.3.4|temp        128.273  166.251  161.605400  176.260
name                  0.209    0.378    0.393224    0.747
name|1.2.3.4        125.940  175.801  165.207200  201.233
name|1.2.3.4/28     128.481  174.424  165.379000  204.293
name|temp             0.995    1.048    1.170200    1.484
temp                  0.218    0.378    0.395472    0.832
temp|1.2.3.4/28     129.039  180.023  167.048200  192.944

Read-new after this patch (each combination of parameters was repeated 10 times):

                     time                         
                      min  median      mean    max
bkusers                                           
1.2.3.4             0.268  1.0120  0.860604  1.473
1.2.3.4/28          0.280  0.9725  0.826328  1.619
1.2.3.4|1.2.3.4/28  1.059  1.0905  1.160900  1.552
1.2.3.4|temp        0.716  0.7980  0.799400  0.904
name                0.204  0.3170  0.309724  0.550
name|1.2.3.4        0.721  0.7885  0.806300  0.950
name|1.2.3.4/28     0.535  0.6020  0.640100  0.826
name|temp           1.029  1.0725  1.082800  1.174
temp                0.210  0.3185  0.314908  0.592
temp|1.2.3.4/28     0.546  0.6270  0.632800  0.738

You can see where you have a combination of users and IPs/ranges the "read-new before this patch" is significantly slower.

"Read-new after this patch" seems to be a bit slower than read-old in several cases. Perhaps not enough to be significant.

I also ran EXPLAIN on all the queries I extracted from the logs (attached below). I don't really know how to interpret the output of EXPLAIN, but I notice that when passing the attached files to mariadb and grepping for "Using index":

  • read-old 117 matches (out of 401 queries)
  • read-new 41 matches (out of 405 queries)

read-old:

read-new:


(Only one query per parameter combination. Parameter combinations appear as a comment before the query they generated. There may be missing queries, I think because some parameter combinations were invalid.)

This was all performed on a local docker wiki with mariadb.

On a local docker using sqlite I ran some of the above tests and noticed:

read-old:

                     time                        
                      min median      mean    max
bkusers                                          
1.2.3.4             0.088  0.171  0.172768  0.343
1.2.3.4/28          0.089  0.174  0.178976  0.428
1.2.3.4|1.2.3.4/28  0.186  0.291  0.287000  0.360
1.2.3.4|temp        0.186  0.190  0.210200  0.278
name                0.088  0.169  0.170240  0.434
name|1.2.3.4        0.187  0.212  0.219800  0.268
name|1.2.3.4/28     0.189  0.204  0.225200  0.297
name|temp           0.186  0.196  0.201600  0.227
temp                0.087  0.172  0.174752  0.378
temp|1.2.3.4/28     0.186  0.189  0.202800  0.237

read-new:

                     time                        
                      min median      mean    max
bkusers                                          
1.2.3.4             0.151  0.246  0.264208  0.733
1.2.3.4/28          0.150  0.242  0.250464  0.639
1.2.3.4|1.2.3.4/28  0.317  0.374  0.368000  0.418
1.2.3.4|temp        0.082  0.101  0.106000  0.128
name                0.155  0.273  0.292504  0.651
name|1.2.3.4        0.082  0.083  0.086600  0.102
name|1.2.3.4/28     0.081  0.084  0.084400  0.091
name|temp           1.195  1.495  1.506000  2.040
temp                0.156  0.262  0.271408  0.479
temp|1.2.3.4/28     0.083  0.085  0.098000  0.128

Notice that bkusers=name|temp is slower in read-new

Change #1013634 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@wmf/1.42.0-wmf.23] block: Fix exception in ApiQueryBlocks when specified users are not blocked

https://gerrit.wikimedia.org/r/1013634

Thanks @dom_walden. Looks like performance should be acceptable. Let's deploy it again and see how it goes.

Change #1013634 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.23] block: Fix exception in ApiQueryBlocks when specified users are not blocked

https://gerrit.wikimedia.org/r/1013634

Mentioned in SAL (#wikimedia-operations) [2024-03-25T01:16:33Z] <tstarling@deploy1002> Started scap: Backport for [[gerrit:1013634|block: Fix exception in ApiQueryBlocks when specified users are not blocked (T360088)]]

Mentioned in SAL (#wikimedia-operations) [2024-03-25T01:33:06Z] <tstarling@deploy1002> tstarling: Backport for [[gerrit:1013634|block: Fix exception in ApiQueryBlocks when specified users are not blocked (T360088)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-03-25T01:45:25Z] <tstarling@deploy1002> Finished scap: Backport for [[gerrit:1013634|block: Fix exception in ApiQueryBlocks when specified users are not blocked (T360088)]] (duration: 28m 51s)