User Details
- User Since
- Jun 9 2015, 9:03 AM (527 w, 2 d)
- Availability
- Away Away until Aug 3.
- IRC Nick
- dcausse
- LDAP User
- DCausse
- MediaWiki User
- DCausse (WMF) [ Global Accounts ]
May 30 2025
Seems to be caused by m1run and perhaps not actually related to a bug upstream. Can be reproduced with:
{ "query": { "function_score": { "query": { "bool": { "filter": [ { "match": { "query": "American pickle" } }, { "match": { "wiki": "enwiki" } }, { "terms": { "method": [ "m1run" ] } } ] } }, "functions": [ { "field_value_factor": { "field": "suggestion_score", "missing": 0 } } ], "boost_mode": "replace" } }, "from": 0, "size": 1, "_source": false, "stored_fields": [ "method", "dym" ] }
Sent to https://search.svc.eqiad.wmnet:9243/glent_production/_search
The EventBus approach is not supposed to work on beta since we don't have the infrastructure there to run the same system.
This problem will be mitigated by:
May 29 2025
Thanks for the report, I see two stale revisions for these items indeed:
select * { VALUES ?stale_revision {1962416023 1969436471} ?item schema:version ?stale_revision }
Restored indices in psi:
green open bowikibooks_content_first zQVg0w7kSfm7Pu4AszzEIg 1 2 1 0 61.4kb 20.4kb green open xhwikibooks_general_first B9NG7ZH4S26RSAdMZB2VRg 1 2 119 0 1.3mb 474.7kb green open wikimania2014wiki_archive_first wi7DZXTvTfuzntdyL6wa7Q 1 2 213 0 72.9kb 24.3kb
Restored indices in omega:
green open pnbwiktionary_general_first 6xo89dkTRAKJ4ac6sXMoSg 1 2 504 0 4.9mb 1.6mb green open avkwiki_general_first BnyNtR69QjSwxV4O3h_9Zg 1 2 5873 0 53.7mb 17.6mb green open ladwiki_general_first TSWg8ToqS0SoBxUhOSnCbQ 1 2 6908 0 92mb 30mb green open ptwikiquote_general_first YP_8JxZIQUmYxH4ZAU1SPA 1 2 19919 0 249.6mb 82.6mb green open cswikiversity_general_first nyonNChhRVOemUiSBtZ9OA 1 2 7229 33 225.4mb 74.9mb green open azwikibooks_general_first speqCfzwRd-Quut-4CqM8g 1 2 5500 0 72.1mb 24.1mb green open wikimania2018wiki_general_first quhWLKCeRv6Yfri3M3dSDQ 1 2 6097 0 35.6mb 11.8mb green open liwiktionary_general_first 4OJuhxtCRh2NXDVJ9Rna4A 1 2 16618 0 176.6mb 58.5mb green open nnwiktionary_general_first H47nSodOT4S2Qtsm9B6FVw 1 2 1640 0 46.6mb 15.5mb green open vowikibooks_general_first eBvVR8CuT8yr4vtnG6ZB4g 1 2 262 0 2.5mb 856kb green open bmwikiquote_general_first EppydKhuS6KixBWfQ-eNRQ 1 2 52 0 599.3kb 199.7kb green open nowikibooks_general_first USM0x_KQQ0qCNFnjQESsJA 1 2 2552 0 40.6mb 13.5mb green open zhwikiversity_general_first G9KoP0fRRrakZOtWdhLHGQ 1 2 7891 0 400.5mb 134.2mb green open akwikibooks_general_first iTLc_9W5SX6Ka7baEPvCtw 1 2 84 0 1.5mb 528.2kb green open collabwiki_content_first eG9JgGIPRWayT8YSNKPVhQ 1 2 3169 471 546.8mb 183.3mb green open iewikibooks_content_first KQkwajbfS32wavqqhy1Ewg 1 2 65 0 2.6mb 906.5kb green open cywikiquote_content_first GI4h_fiMT1isSQep4OdqOw 1 2 457 0 6.9mb 2.3mb green open fawikibooks_content_first UWCtUCLiQS63bFITlr-Wxg 1 2 4271 0 397.6mb 132.6mb green open svwikiquote_content_first 1rb46QjtSauQ9rVr9C9uYg 1 2 1906 0 60.6mb 20.9mb green open biwiktionary_content_first 6k9oFADRST2zvTnbnEJgtQ 1 2 0 0 624b 208b green open afwiktionary_content_first SlU_uOrcT_en1isCPWO2_Q 1 2 23575 0 227.9mb 75.7mb green open kywiktionary_content_first 3Q1hENhDQjuAxmHHMYuIcA 1 2 21290 0 120.9mb 40.1mb green open trwikisource_content_first 5PhxU9u6QGuKzf8OS6nRjQ 1 2 11848 0 881.8mb 294.3mb green open mnwiktionary_content_first HoeC_efhRuqaso7w65EMnw 1 2 8932 0 106.4mb 35.4mb green open wikimania2007wiki_content_first xHjtKAeDQ-qiahIpN5wV_w 1 2 708 0 34.3mb 11mb green open sgwiki_content_first yXk6MI_USNyxa8Biv4z6Lg 1 2 587 0 7mb 2.3mb green open bat_smgwiki_archive_first rstpepYLR4eyiG0m7jFxdQ 1 2 1937 0 438kb 150.1kb green open bmwiktionary_archive_first ln6VdSm3Q2KIVcOpwdUcPQ 1 2 4 0 14.1kb 4.7kb green open crwiktionary_archive_first kgKWX502RSO0uFugkXQmdQ 1 2 0 0 624b 208b green open cswikinews_archive_first M4uk4lGQTXW15yxjiMOs2g 1 2 552 0 212.9kb 70.9kb green open gorwiktionary_archive_first O96K3pDHRvCOHoNMYkSP0g 1 2 185 0 45.4kb 15.1kb green open hewiktionary_archive_first 3p8ZGniaRa2z2ZXMUn2xOA 1 2 4703 0 1mb 372.9kb green open iewikibooks_archive_first CTE27X02S66NgcCSUGShkQ 1 2 197 0 49.8kb 16.6kb green open kgwiki_archive_first 7wt9BfeVSGODn9R0-WbHpg 1 2 800 0 154kb 54.7kb green open kowikibooks_archive_first UKrmgC0FQTSyq5_wKXHrEw 1 2 1938 0 411.9kb 137.3kb green open lldwiki_archive_first GwL6ROKtT2eTWTDoXG4V2w 1 2 1319 0 194.9kb 64.9kb green open niawiki_archive_first 0rGaElfRT32vOSajEhgHzQ 1 2 255 0 74kb 24.6kb green open nowikimedia_archive_first I0XjPY2fQm2h3Rg_iE-Exg 1 2 683 0 138.2kb 49.5kb green open plwikibooks_archive_first ypf49ohZQDCvFcOZJ5GOWw 1 2 7178 4 1.5mb 537.4kb green open quwikibooks_archive_first MLmimEdPRoqXVISq5HNRLA 1 2 6 0 14.7kb 4.9kb green open sahwikisource_archive_first Bk9EUZVFQNuMsDsqnp2YdA 1 2 156 0 44.9kb 14.9kb green open sswiki_archive_first 1116brBAQAy-BkUIXn5NXQ 1 2 1015 0 201.8kb 67.2kb green open vecwikisource_archive_first vMazl--5Q-2x18YHbbuJlQ 1 2 356 0 97.7kb 32.5kb
Wikis with a broken archive, content, general index:
afwiktionary akwikibooks avkwiki azwikibooks bat biwiktionary bmwikiquote bmwiktionary bowikibooks collabwiki crwiktionary cswikinews cswikiversity cywikiquote fawikibooks gorwiktionary hewiktionary iewikibooks kgwiki kowikibooks kywiktionary ladwiki liwiktionary lldwiki mnwiktionary niawiki nnwiktionary nowikibooks nowikimedia plwikibooks pnbwiktionary ptwikiquote quwikibooks sahwikisource sgwiki sswiki svwikiquote trwikisource vecwikisource vowikibooks wikimania2007wiki wikimania2014wiki wikimania2018wiki xhwikibooks zhwikiversity
I just deleted all these red indices because I'm not sure I could have recovered them, this should hopefully unblock the update pipeline.
Next step is to recover these indices, my current plan is as follow:
- build the indices using UpdateSearchIndexConfig
- run a copy from codfw -> eqiad using a script
May 28 2025
@Urbanecm_WMF thanks for catching & fixing this!
May 27 2025
PR above got merged, moving back to the backlog, we'll continue working on this once we get closer to an opensearch version that has this feature
From the code:
// We use this to pick up redirects so we can update their targets. // Can't re-use PageDeleteComplete because the page info's // already gone // If we abort or fail deletion it's no big deal because this will // end up being a no-op when it executes.
There's the same hack in EventBus but I believe the new event system solves this issue by keeping track of the redirect target with \MediaWiki\Page\Event\PageDeletedEvent::wasRedirect() and \MediaWiki\Page\Event\PageDeletedEvent::getRedirectTargetBefore() and this will no longer be necessary.
Here's my understanding of the possible solutions:
- use the keyed state: will probably have a huge impact on throughput & latency, a new batch will be created per key leading to most batches being rather small (1 event) and always fired by the timer
- use the operator state: probably the most natural solution to keep the current logic, in-flight events will be re-played on restarts, issue is that CheckpointedFunction does not appear to be available with pyflink
- use the AsyncIO operator, should be the preferred approach, this solutions provides delivery guarantees with no extra duplicates, unfortunately not available with pyflink
- use the flink parallelism, instead of batching events we could achieve higher concurrency by simply setting the parallelism of the fetch operator (12 to match the default process_max_workers_default), not the best use of resources but probably acceptable for the expected throughput of the page_change stream
May 26 2025
May 23 2025
Sure no need to run on s8 indeed.
@Clement_Goubert thanks!
May 22 2025
May 19 2025
Seems to be working: searching for Яблочный пирог shows Рецепт:Яблочный_пирог in the sidebar.
May 14 2025
might be related to T392058
Looks great!
I would find a bit more natural to have serp positioned before the target page
May 13 2025
@Keewanlew some clarifications: intitle is searching for words in the titles:
- intitle:s does not find the page named Some title
- intitle:some can find a page named Some title
- intitle:s can find a page named The letter S
May 12 2025
PR uploaded to add support for matched_fields: https://github.com/opensearch-project/OpenSearch/pull/18166
May 9 2025
Batch id of the enwiki_titlewiki index in eqiad is 1746625724 (Wed May 07 2025 13:48:44) so this means the failure is possibly related to the incident or could just be a coincidence.
May 8 2025
I've added a quick cronjob running from stat1009:/home/dcausse/wdqs_reconcile/reconcile.sh running daily at 10:00 UTC and will reconcile all items edited the previous day that have a change in a SomeValue node.
Moving to waiting to not forgot to stop that job once the reload is done.
Ran the script from Erik and found:
hewikisource 3014 fiwiktionary 1151 trwiktionary 2914 zhwiktionary 1754 mgwiktionary 1859 enwiktionary 1556 enwiki 5335305
Re-opening, we seem to have promoted a bad build recently causing T393663. Unfortunately we disabled completion index rebuilds as part of the opensearch migration and the bad index kept serving stale results for quite some time.
Reason for the bad promotion is quite unclear, sole trace I could find is https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.05.07?id=OJUeq5YBfOjk-Vo1yy77 but this error suggests that the build failed and should not have promoted the index. It's possible the bad index was promoted on the previous run on May 6 but not finding anything about this yet.
Quick heads up that wdqs users are starting to get impacted by this.
This should now be resolved, please see T393663#10803425
This should now be resolved, please see T393663#10803425
This should now be resolved, please see T393663#10803425
Apologies about this, as part of T388610: Migrate production Elastic clusters to Opensearch (CirrusSearch backend infrastructure) we disabled some updates, some transition took longer than we expected. I routed the search traffic to codfw which should have fresh indices. The example query mentioned in the description now returns results.
Very likely due to the opensearch migration
May 7 2025
I think it's a reasonable expectation that when you search for the default namespaces you want the default namespaces to be searched on the sister wikis as well.
May 6 2025
just a quick heads in case you planned to re-index commons/wikidata as part of this task, please skip these 2 indices til T392058 is fixed.
Apr 24 2025
Apr 23 2025
Does some kind of similar logging/tracking already exist in Query Service? What information does it contain?
Apr 18 2025
Also happens on action=parse
Apr 17 2025
@gmodena thanks for working on this!
@Nikki (or anyone else interested in filtering on lemma spelling variants) while working on this we realized that some clarifications might be needed.
The new search keyword we will add is currently named lemmaspellingvariant, it's not ideal because quite long but I found that haslemma was too ambiguous (please let us know if you have objections/suggestions).
The use of this keyword will be like other keywords and quite independent from the rest of the search query, for instance: aluminium lemmaspellingvariant:en-us will find https://www.wikidata.org/wiki/Lexeme:L18179. From the ticket description I think this is what is expected but if not please let us know. Allowing to match a particular lemma string against its specific language variant will require some thinking on our side and is not entirely trivial.
I think this task should be done as part of T372912 which will involve some refactoring of the way the tags are shipped.
I suspect that the delta you generate could easily be grouped by page_id after they're computed.
Cirrus should now gracefully handle this exception, I took a quick look at EditPage but I'm not quite clear how to fail gracefully there.
Apr 16 2025
Tagging Advanced-Search because we introduced subpageof in CirrusSearch to workaround confusing behaviors of existing keywords like prefix:, see T159321 and T180495. There's possibly something to do on the UI to better guide the user using this field?
Search keywords that can escape the initial namespace filter have been quite confusing as well and we decided to not introduce new ones because there's no way to understand from the CirrusSearch perspective what's the actual intent of the user.
Apr 15 2025
tested wdqs & wcqs deploys and all the expected services got restarted successfully.
Apr 14 2025
already done
we now use a custom elasticsearch sink, I doubt we'll want to go back to the one provided by flink
fetch_articletopic_prediction_thresholds has been removed
Apr 11 2025
I've been playing with grouping top results by cluster, this is at http://localhost:12222/clustered. Could be interesting in the context of diversity search.
Wrote a small demo available on stat1009, you need a tunnel there with ssh -L12222:localhost:12222 stat1009.eqiad.wmnet and then open http://localhost:12222/.