Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (527 w, 2 d)
Availability
Away Away until Aug 3.
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

May 30 2025

dcausse updated the task description for T395677: Search backend error: illegal_argument_exception: field value function must not produce negative scores.
May 30 2025, 1:43 PM · Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse added a comment to T395677: Search backend error: illegal_argument_exception: field value function must not produce negative scores.

Seems to be caused by m1run and perhaps not actually related to a bug upstream. Can be reproduced with:

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "filter": [
            {
              "match": {
                "query": "American pickle"
              }
            },
            {
              "match": {
                "wiki": "enwiki"
              }
            },
            {
              "terms": {
                "method": [
                  "m1run"
                ]
              }
            }
          ]
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "suggestion_score",
            "missing": 0
          }
        }
      ],
      "boost_mode": "replace"
    }
  },
  "from": 0,
  "size": 1,
  "_source": false,
  "stored_fields": [
    "method",
    "dym"
  ]
}

Sent to https://search.svc.eqiad.wmnet:9243/glent_production/_search

May 30 2025, 1:29 PM · Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse added a project to T395677: Search backend error: illegal_argument_exception: field value function must not produce negative scores: CirrusSearch.
May 30 2025, 1:09 PM · Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse created T395677: Search backend error: illegal_argument_exception: field value function must not produce negative scores.
May 30 2025, 1:08 PM · Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse closed T395425: Updating weighed tags via EventBus in beta does not work as Resolved.

The EventBus approach is not supposed to work on beta since we don't have the infrastructure there to run the same system.

May 30 2025, 10:05 AM · Discovery-Search (2025.05.24 - 2025.06.13), Beta-Cluster-Infrastructure, CirrusSearch
dcausse added a comment to T395402: MediaWikiCronJobFailed.

This problem will be mitigated by:

May 30 2025, 10:03 AM · Data-Platform-SRE (2025.07.05 - 2025.07.25), Discovery-Search (2025.06.13 - 2025.07.04)

May 29 2025

dcausse closed T393655: SPARQL shows redirect with original data months after merge as Resolved.

Thanks for the report, I see two stale revisions for these items indeed:

select * {
  VALUES ?stale_revision {1962416023 1969436471}
  ?item schema:version ?stale_revision
}
May 29 2025, 3:48 PM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata
dcausse lowered the priority of T395546: opensearch psi and omega clusters red in eqiad from High to Medium.
May 29 2025, 2:53 PM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)
dcausse updated the task description for T395546: opensearch psi and omega clusters red in eqiad.
May 29 2025, 2:53 PM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)
dcausse added a comment to T395546: opensearch psi and omega clusters red in eqiad.

Restored indices in psi:

green open bowikibooks_content_first                         zQVg0w7kSfm7Pu4AszzEIg 1 2      1     0   61.4kb   20.4kb
green open xhwikibooks_general_first                         B9NG7ZH4S26RSAdMZB2VRg 1 2    119     0    1.3mb  474.7kb
green open wikimania2014wiki_archive_first                   wi7DZXTvTfuzntdyL6wa7Q 1 2    213     0   72.9kb   24.3kb
May 29 2025, 2:24 PM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)
dcausse added a comment to T395546: opensearch psi and omega clusters red in eqiad.

Restored indices in omega:

green open pnbwiktionary_general_first                  6xo89dkTRAKJ4ac6sXMoSg 1 2    504     0    4.9mb    1.6mb
green open avkwiki_general_first                        BnyNtR69QjSwxV4O3h_9Zg 1 2   5873     0   53.7mb   17.6mb
green open ladwiki_general_first                        TSWg8ToqS0SoBxUhOSnCbQ 1 2   6908     0     92mb     30mb
green open ptwikiquote_general_first                    YP_8JxZIQUmYxH4ZAU1SPA 1 2  19919     0  249.6mb   82.6mb
green open cswikiversity_general_first                  nyonNChhRVOemUiSBtZ9OA 1 2   7229    33  225.4mb   74.9mb
green open azwikibooks_general_first                    speqCfzwRd-Quut-4CqM8g 1 2   5500     0   72.1mb   24.1mb
green open wikimania2018wiki_general_first              quhWLKCeRv6Yfri3M3dSDQ 1 2   6097     0   35.6mb   11.8mb
green open liwiktionary_general_first                   4OJuhxtCRh2NXDVJ9Rna4A 1 2  16618     0  176.6mb   58.5mb
green open nnwiktionary_general_first                   H47nSodOT4S2Qtsm9B6FVw 1 2   1640     0   46.6mb   15.5mb
green open vowikibooks_general_first                    eBvVR8CuT8yr4vtnG6ZB4g 1 2    262     0    2.5mb    856kb
green open bmwikiquote_general_first                    EppydKhuS6KixBWfQ-eNRQ 1 2     52     0  599.3kb  199.7kb
green open nowikibooks_general_first                    USM0x_KQQ0qCNFnjQESsJA 1 2   2552     0   40.6mb   13.5mb
green open zhwikiversity_general_first                  G9KoP0fRRrakZOtWdhLHGQ 1 2   7891     0  400.5mb  134.2mb
green open akwikibooks_general_first                    iTLc_9W5SX6Ka7baEPvCtw 1 2     84     0    1.5mb  528.2kb
green open collabwiki_content_first                     eG9JgGIPRWayT8YSNKPVhQ 1 2   3169   471  546.8mb  183.3mb
green open iewikibooks_content_first                    KQkwajbfS32wavqqhy1Ewg 1 2     65     0    2.6mb  906.5kb
green open cywikiquote_content_first                    GI4h_fiMT1isSQep4OdqOw 1 2    457     0    6.9mb    2.3mb
green open fawikibooks_content_first                    UWCtUCLiQS63bFITlr-Wxg 1 2   4271     0  397.6mb  132.6mb
green open svwikiquote_content_first                    1rb46QjtSauQ9rVr9C9uYg 1 2   1906     0   60.6mb   20.9mb
green open biwiktionary_content_first                   6k9oFADRST2zvTnbnEJgtQ 1 2      0     0     624b     208b
green open afwiktionary_content_first                   SlU_uOrcT_en1isCPWO2_Q 1 2  23575     0  227.9mb   75.7mb
green open kywiktionary_content_first                   3Q1hENhDQjuAxmHHMYuIcA 1 2  21290     0  120.9mb   40.1mb
green open trwikisource_content_first                   5PhxU9u6QGuKzf8OS6nRjQ 1 2  11848     0  881.8mb  294.3mb
green open mnwiktionary_content_first                   HoeC_efhRuqaso7w65EMnw 1 2   8932     0  106.4mb   35.4mb
green open wikimania2007wiki_content_first              xHjtKAeDQ-qiahIpN5wV_w 1 2    708     0   34.3mb     11mb
green open sgwiki_content_first                         yXk6MI_USNyxa8Biv4z6Lg 1 2    587     0      7mb    2.3mb
green open bat_smgwiki_archive_first                    rstpepYLR4eyiG0m7jFxdQ 1 2   1937     0    438kb  150.1kb
green open bmwiktionary_archive_first                   ln6VdSm3Q2KIVcOpwdUcPQ 1 2      4     0   14.1kb    4.7kb
green open crwiktionary_archive_first                   kgKWX502RSO0uFugkXQmdQ 1 2      0     0     624b     208b
green open cswikinews_archive_first                     M4uk4lGQTXW15yxjiMOs2g 1 2    552     0  212.9kb   70.9kb
green open gorwiktionary_archive_first                  O96K3pDHRvCOHoNMYkSP0g 1 2    185     0   45.4kb   15.1kb
green open hewiktionary_archive_first                   3p8ZGniaRa2z2ZXMUn2xOA 1 2   4703     0      1mb  372.9kb
green open iewikibooks_archive_first                    CTE27X02S66NgcCSUGShkQ 1 2    197     0   49.8kb   16.6kb
green open kgwiki_archive_first                         7wt9BfeVSGODn9R0-WbHpg 1 2    800     0    154kb   54.7kb
green open kowikibooks_archive_first                    UKrmgC0FQTSyq5_wKXHrEw 1 2   1938     0  411.9kb  137.3kb
green open lldwiki_archive_first                        GwL6ROKtT2eTWTDoXG4V2w 1 2   1319     0  194.9kb   64.9kb
green open niawiki_archive_first                        0rGaElfRT32vOSajEhgHzQ 1 2    255     0     74kb   24.6kb
green open nowikimedia_archive_first                    I0XjPY2fQm2h3Rg_iE-Exg 1 2    683     0  138.2kb   49.5kb
green open plwikibooks_archive_first                    ypf49ohZQDCvFcOZJ5GOWw 1 2   7178     4    1.5mb  537.4kb
green open quwikibooks_archive_first                    MLmimEdPRoqXVISq5HNRLA 1 2      6     0   14.7kb    4.9kb
green open sahwikisource_archive_first                  Bk9EUZVFQNuMsDsqnp2YdA 1 2    156     0   44.9kb   14.9kb
green open sswiki_archive_first                         1116brBAQAy-BkUIXn5NXQ 1 2   1015     0  201.8kb   67.2kb
green open vecwikisource_archive_first                  vMazl--5Q-2x18YHbbuJlQ 1 2    356     0   97.7kb   32.5kb
May 29 2025, 1:12 PM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)
dcausse triaged T395546: opensearch psi and omega clusters red in eqiad as High priority.
May 29 2025, 12:53 PM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)
dcausse added a comment to T395546: opensearch psi and omega clusters red in eqiad.

Wikis with a broken archive, content, general index:

afwiktionary
akwikibooks
avkwiki
azwikibooks
bat
biwiktionary
bmwikiquote
bmwiktionary
bowikibooks
collabwiki
crwiktionary
cswikinews
cswikiversity
cywikiquote
fawikibooks
gorwiktionary
hewiktionary
iewikibooks
kgwiki
kowikibooks
kywiktionary
ladwiki
liwiktionary
lldwiki
mnwiktionary
niawiki
nnwiktionary
nowikibooks
nowikimedia
plwikibooks
pnbwiktionary
ptwikiquote
quwikibooks
sahwikisource
sgwiki
sswiki
svwikiquote
trwikisource
vecwikisource
vowikibooks
wikimania2007wiki
wikimania2014wiki
wikimania2018wiki
xhwikibooks
zhwikiversity
May 29 2025, 10:04 AM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)
dcausse added a comment to T395546: opensearch psi and omega clusters red in eqiad.

I just deleted all these red indices because I'm not sure I could have recovered them, this should hopefully unblock the update pipeline.
Next step is to recover these indices, my current plan is as follow:

  • build the indices using UpdateSearchIndexConfig
  • run a copy from codfw -> eqiad using a script
May 29 2025, 9:46 AM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)
dcausse created T395546: opensearch psi and omega clusters red in eqiad.
May 29 2025, 9:44 AM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25)

May 28 2025

dcausse added a comment to T395425: Updating weighed tags via EventBus in beta does not work.

@Urbanecm_WMF thanks for catching & fixing this!

May 28 2025, 9:04 AM · Discovery-Search (2025.05.24 - 2025.06.13), Beta-Cluster-Infrastructure, CirrusSearch
dcausse claimed T393655: SPARQL shows redirect with original data months after merge.
May 28 2025, 8:16 AM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata

May 27 2025

dcausse moved T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch from Next Projects to elastic / cirrus on the Discovery-Search board.
May 27 2025, 2:49 PM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse edited projects for T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch, added: Discovery-Search; removed Discovery-Search (2025.05.24 - 2025.06.13).

PR above got merged, moving back to the backlog, we'll continue working on this once we get closer to an opensearch version that has this feature

May 27 2025, 2:49 PM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse moved T391792: Align search platform DAGs to DPE best practices from In Progress to Needs Review on the Discovery-Search (2025.05.24 - 2025.06.13) board.
May 27 2025, 2:47 PM · Patch-For-Review, Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse added a comment to T347282: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions.

In this case, would you suggest making the HTTP call synchronous ? IIRC we tried this early on, but the interation between python, beam and Flink did lead to very high latencies even for very low throughput streams. I'd need to revisit.

May 27 2025, 2:26 PM · Patch-For-Review, Data-Engineering (Q4 2025 April 1st - June 30th), Event-Platform
dcausse added a comment to T394791: [SPIKE] Investigate CirrusSearch extension for Domain Event migrations.

It's unclear to me why this implements PageDeleteHook - that hook runs *before* deletion, the deletion may not even happen. Since the handler method does the same thing as the one for PageDeleteCompleteHook, it seems redundant. Perhaps it is the result of a misundersatnding.

From the code:

		// We use this to pick up redirects so we can update their targets.
		// Can't re-use PageDeleteComplete because the page info's
		// already gone
		// If we abort or fail deletion it's no big deal because this will
		// end up being a no-op when it executes.

There's the same hack in EventBus but I believe the new event system solves this issue by keeping track of the redirect target with \MediaWiki\Page\Event\PageDeletedEvent::wasRedirect() and \MediaWiki\Page\Event\PageDeletedEvent::getRedirectTargetBefore() and this will no longer be necessary.

May 27 2025, 7:28 AM · Discovery-Search (2025.07.04 - 2025.07.25), MW-Interfaces-Team, CirrusSearch, OKR-Work, MediaWiki-DomainEvents
dcausse added a comment to T347282: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions.

Here's my understanding of the possible solutions:

  • use the keyed state: will probably have a huge impact on throughput & latency, a new batch will be created per key leading to most batches being rather small (1 event) and always fired by the timer
  • use the operator state: probably the most natural solution to keep the current logic, in-flight events will be re-played on restarts, issue is that CheckpointedFunction does not appear to be available with pyflink
  • use the AsyncIO operator, should be the preferred approach, this solutions provides delivery guarantees with no extra duplicates, unfortunately not available with pyflink
  • use the flink parallelism, instead of batching events we could achieve higher concurrency by simply setting the parallelism of the fetch operator (12 to match the default process_max_workers_default), not the best use of resources but probably acceptable for the expected throughput of the page_change stream
May 27 2025, 7:05 AM · Patch-For-Review, Data-Engineering (Q4 2025 April 1st - June 30th), Event-Platform

May 26 2025

dcausse placed T363521: Completion suggester can promote a bad build up for grabs.
May 26 2025, 3:21 PM · Discovery-Search (2025.07.04 - 2025.07.25), Sustainability (Incident Followup), CirrusSearch

May 23 2025

dcausse added a comment to P76406 Search weighted tags from search index dumps.

Hi @dcausse ,

Thank you very much for sharing this. Works like a charm!
Looking into the data, we have article topic predictions with scores, and if the article has a link recommendation as a boolean.
This is awesome.
So, if we want to find the add-a-link recommendation scores, we should look in either maria db directly or another index.

May 23 2025, 9:59 AM
dcausse added a comment to T388538: Migrate discovery-search jobs to mw-cron.

pinging @hoo & Wikidata for visibility on the work on mediawiki_job_wikidata-updateQueryServiceLag.timer

Should that job alert to Wikidata rather than Discovery-Search ?

May 23 2025, 9:40 AM · Discovery-Search (2025.06.13 - 2025.07.04), Wikidata, Patch-For-Review, serviceops
dcausse added a project to T388538: Migrate discovery-search jobs to mw-cron: Wikidata.

pinging @hoo & Wikidata for visibility on the work on mediawiki_job_wikidata-updateQueryServiceLag.timer

May 23 2025, 9:29 AM · Discovery-Search (2025.06.13 - 2025.07.04), Wikidata, Patch-For-Review, serviceops
dcausse created T395109: UpdateSuggesterIndex should fail early if the main indices do not exist.
May 23 2025, 9:20 AM · MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), Discovery-Search (2025.06.13 - 2025.07.04), CirrusSearch
dcausse added a comment to T388538: Migrate discovery-search jobs to mw-cron.

It's when trying to run on s8, so wikidata, yes. I could also just remove s8 from the shards the script is running on?

Sure no need to run on s8 indeed.

May 23 2025, 9:04 AM · Discovery-Search (2025.06.13 - 2025.07.04), Wikidata, Patch-For-Review, serviceops
dcausse added a comment to T388538: Migrate discovery-search jobs to mw-cron.

@Clement_Goubert thanks!

May 23 2025, 8:17 AM · Discovery-Search (2025.06.13 - 2025.07.04), Wikidata, Patch-For-Review, serviceops

May 22 2025

dcausse updated subscribers of P76406 Search weighted tags from search index dumps.
May 22 2025, 2:00 PM
dcausse created P76406 Search weighted tags from search index dumps.
May 22 2025, 1:38 PM

May 19 2025

dcausse closed T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia as Resolved.

Seems to be working: searching for Яблочный пирог shows Рецепт:Яблочный_пирог in the sidebar.

May 19 2025, 3:33 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites

May 14 2025

dcausse moved T394274: InvalidArgumentException: Duplicate field labels for model wikibase-mediainfo from Incoming to Done on the Discovery-Search (2025.05.02 - 2025.05.23) board.
May 14 2025, 10:21 AM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), Wikidata-Omega (Completed Tasks), Wikidata, Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch, Wikimedia-production-error
dcausse reassigned T394274: InvalidArgumentException: Duplicate field labels for model wikibase-mediainfo from dcausse to Lucas_Werkmeister_WMDE.
May 14 2025, 8:55 AM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), Wikidata-Omega (Completed Tasks), Wikidata, Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch, Wikimedia-production-error
dcausse claimed T394274: InvalidArgumentException: Duplicate field labels for model wikibase-mediainfo.

might be related to T392058

May 14 2025, 8:47 AM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), Wikidata-Omega (Completed Tasks), Wikidata, Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch, Wikimedia-production-error
dcausse moved T391876: Deepcategory search does not work with MediaSearch on commons from Needs Review to To be Deployed on the Discovery-Search (2025.05.02 - 2025.05.23) board.
May 14 2025, 8:08 AM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), CirrusSearch, Commons
dcausse added a comment to F59944900: search proposed sankey shape.

Looks great!
I would find a bit more natural to have serp positioned before the target page

May 14 2025, 7:46 AM

May 13 2025

dcausse added a comment to T392409: 1.43 advance search extensions unable to search in title contain.

@Keewanlew some clarifications: intitle is searching for words in the titles:

  • intitle:s does not find the page named Some title
  • intitle:some can find a page named Some title
  • intitle:s can find a page named The letter S
May 13 2025, 8:16 AM · Advanced-Search

May 12 2025

dcausse assigned T393872: Make weighted tags no longer be WMF-specific to SD0001.
May 12 2025, 3:43 PM · MW-1.45-notes (1.45.0-wmf.7; 2025-06-24), Discovery-Search (2025.06.13 - 2025.07.04), Patch-For-Review, CirrusSearch
dcausse added a comment to T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch.

PR uploaded to add support for matched_fields: https://github.com/opensearch-project/OpenSearch/pull/18166

May 12 2025, 3:31 PM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse merged T356244: MediaSearch should display search warnings into T391876: Deepcategory search does not work with MediaSearch on commons.
May 12 2025, 9:23 AM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), CirrusSearch, Commons
dcausse merged task T356244: MediaSearch should display search warnings into T391876: Deepcategory search does not work with MediaSearch on commons.
May 12 2025, 9:23 AM · Structured-Data-Backlog, MediaSearch

May 9 2025

dcausse added a comment to T363521: Completion suggester can promote a bad build.

Batch id of the enwiki_titlewiki index in eqiad is 1746625724 (Wed May 07 2025 13:48:44) so this means the failure is possibly related to the incident or could just be a coincidence.

May 9 2025, 4:49 PM · Discovery-Search (2025.07.04 - 2025.07.25), Sustainability (Incident Followup), CirrusSearch

May 8 2025

dcausse updated the task description for T386098: Run a full data-reload on wdqs-main, wdqs-scholarly and wdqs to capture new blank node labels.
May 8 2025, 9:23 PM · Data-Platform-SRE (2025.07.05 - 2025.07.25), Wikidata, Wikidata-Query-Service
dcausse moved T393713: Regularly reconcile items with delete blank nodes from Incoming to Blocked / Waiting on the Discovery-Search (2025.05.02 - 2025.05.23) board.

I've added a quick cronjob running from stat1009:/home/dcausse/wdqs_reconcile/reconcile.sh running daily at 10:00 UTC and will reconcile all items edited the previous day that have a change in a SomeValue node.
Moving to waiting to not forgot to stop that job once the reload is done.

May 8 2025, 5:09 PM · Discovery-Search (2025.07.04 - 2025.07.25), Wikidata
dcausse created T393713: Regularly reconcile items with delete blank nodes.
May 8 2025, 2:05 PM · Discovery-Search (2025.07.04 - 2025.07.25), Wikidata
dcausse added a comment to T363521: Completion suggester can promote a bad build.

Ran the script from Erik and found:

hewikisource 3014
fiwiktionary 1151
trwiktionary 2914
zhwiktionary 1754
mgwiktionary 1859
enwiktionary 1556
enwiki 5335305
May 8 2025, 10:20 AM · Discovery-Search (2025.07.04 - 2025.07.25), Sustainability (Incident Followup), CirrusSearch
dcausse reopened T363521: Completion suggester can promote a bad build, a subtask of T363694: Post incident tasks: Search missing results/unavailable for some eqiad users, as Open.
May 8 2025, 10:15 AM · Data-Platform-SRE (2024.05.06 - 2024.05.26), Discovery-Search (Current work), Sustainability (Incident Followup), SRE-OnFire
dcausse reopened T363521: Completion suggester can promote a bad build as "Open".

Re-opening, we seem to have promoted a bad build recently causing T393663. Unfortunately we disabled completion index rebuilds as part of the opensearch migration and the bad index kept serving stale results for quite some time.
Reason for the bad promotion is quite unclear, sole trace I could find is https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.05.07?id=OJUeq5YBfOjk-Vo1yy77 but this error suggests that the build failed and should not have promoted the index. It's possible the bad index was promoted on the previous run on May 6 but not finding anything about this yet.

May 8 2025, 10:15 AM · Discovery-Search (2025.07.04 - 2025.07.25), Sustainability (Incident Followup), CirrusSearch
dcausse added a comment to T386098: Run a full data-reload on wdqs-main, wdqs-scholarly and wdqs to capture new blank node labels.

Quick heads up that wdqs users are starting to get impacted by this.

May 8 2025, 9:35 AM · Data-Platform-SRE (2025.07.05 - 2025.07.25), Wikidata, Wikidata-Query-Service
dcausse closed T393635: Dated elections no longer top results in search preview on en.wikipedia as Resolved.

This should now be resolved, please see T393663#10803425

May 8 2025, 8:37 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393662: Wikipedia lacks some search results in search suggestions as Resolved.
May 8 2025, 8:35 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse added a comment to T393662: Wikipedia lacks some search results in search suggestions.

This should now be resolved, please see T393663#10803425

May 8 2025, 8:35 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393660: Adding a link to another article does not suggest articles whose names are similar to the text being linked but instead suggests unrelated topics as Resolved.

This should now be resolved, please see T393663#10803425

May 8 2025, 8:34 AM · VisualEditor
dcausse edited projects for T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), added: CirrusSearch; removed WMF-General-or-Unknown.
May 8 2025, 8:26 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), a subtask of T393635: Dated elections no longer top results in search preview on en.wikipedia, as Resolved.
May 8 2025, 8:26 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), a subtask of T393660: Adding a link to another article does not suggest articles whose names are similar to the text being linked but instead suggests unrelated topics, as Resolved.
May 8 2025, 8:26 AM · VisualEditor
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), a subtask of T393662: Wikipedia lacks some search results in search suggestions, as Resolved.
May 8 2025, 8:26 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions) as Resolved.

Apologies about this, as part of T388610: Migrate production Elastic clusters to Opensearch (CirrusSearch backend infrastructure) we disabled some updates, some transition took longer than we expected. I routed the search traffic to codfw which should have fresh indices. The example query mentioned in the description now returns results.

May 8 2025, 8:26 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)
dcausse edited projects for T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), added: Discovery-Search (2025.05.02 - 2025.05.23); removed Discovery-Search.
May 8 2025, 8:21 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)
dcausse claimed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions).

Very likely due to the opensearch migration

May 8 2025, 7:25 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)

May 7 2025

dcausse claimed T391792: Align search platform DAGs to DPE best practices.
May 7 2025, 2:06 PM · Patch-For-Review, Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse moved T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia from In Progress to Needs Review on the Discovery-Search (2025.05.02 - 2025.05.23) board.
May 7 2025, 10:25 AM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites
dcausse added a comment to T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia.

I think it's a reasonable expectation that when you search for the default namespaces you want the default namespaces to be searched on the sister wikis as well.

May 7 2025, 10:24 AM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites

May 6 2025

dcausse claimed T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia.
May 6 2025, 1:17 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites
dcausse added a comment to T393392: Reindex Czech-language wikis to enable diacritic folding.

just a quick heads in case you planned to re-index commons/wikidata as part of this task, please skip these 2 indices til T392058 is fixed.

May 6 2025, 12:16 PM · Discovery-Search (2025.06.13 - 2025.07.04), CirrusSearch

Apr 24 2025

dcausse claimed T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch.
Apr 24 2025, 9:03 AM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse edited projects for T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch, added: Discovery-Search (2025.04.11 - 2025.05.02); removed Discovery-Search.
Apr 24 2025, 9:03 AM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse closed T391090: TypeError: array_flip(): Argument #1 ($array) must be of type array, null given, a subtask of T374702: Cleanup: Remove deprecated weighted tag methods, as Resolved.
Apr 24 2025, 9:02 AM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch
dcausse closed T391090: TypeError: array_flip(): Argument #1 ($array) must be of type array, null given as Resolved.
Apr 24 2025, 9:02 AM · Discovery-Search (2025.05.02 - 2025.05.23), MW-1.44-notes (1.44.0-wmf.25; 2025-04-15), CirrusSearch

Apr 23 2025

dcausse added a comment to T271776: Allow limiting lexeme searches by language.

Why was it too ambiguous? The idea was to match the existing haslabel, hasdescription and hascaption keywords (https://www.mediawiki.org/wiki/Help:Extension:WikibaseCirrusSearch#haslabel/hascaption) - lemmas are effectively labels for lexemes, so it makes sense for the lemma keywords to be similar to the label keywords.

Apr 23 2025, 1:15 PM · Discovery-Search (2025.07.04 - 2025.07.25), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Patch-For-Review, OKR-Work, CirrusSearch, Wikidata, Wikidata Lexicographical data
dcausse added a comment to T391383: Metrics for federated querying.

Does some kind of similar logging/tracking already exist in Query Service? What information does it contain?

Apr 23 2025, 8:09 AM · Wikidata, Wikidata-Query-Service

Apr 18 2025

dcausse added a comment to T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).

Also happens on action=parse

Apr 18 2025, 8:31 AM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error

Apr 17 2025

dcausse closed T388549: Vector Search PoC as Resolved.

@gmodena thanks for working on this!

Apr 17 2025, 4:29 PM · Discovery-Search (2025.04.11 - 2025.05.02)
dcausse added a comment to T271776: Allow limiting lexeme searches by language.

@Nikki (or anyone else interested in filtering on lemma spelling variants) while working on this we realized that some clarifications might be needed.
The new search keyword we will add is currently named lemmaspellingvariant, it's not ideal because quite long but I found that haslemma was too ambiguous (please let us know if you have objections/suggestions).
The use of this keyword will be like other keywords and quite independent from the rest of the search query, for instance: aluminium lemmaspellingvariant:en-us will find https://www.wikidata.org/wiki/Lexeme:L18179. From the ticket description I think this is what is expected but if not please let us know. Allowing to match a particular lemma string against its specific language variant will require some thinking on our side and is not entirely trivial.

Apr 17 2025, 3:32 PM · Discovery-Search (2025.07.04 - 2025.07.25), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Patch-For-Review, OKR-Work, CirrusSearch, Wikidata, Wikidata Lexicographical data
dcausse added a subtask for T372912: Migrate image recommendation to use page_weighted_tags_changed stream: T389643: [L] Adapt or transform image_suggestions_search_index_delta to allow creating one update per article.
Apr 17 2025, 2:57 PM · Discovery-Search (2025.07.04 - 2025.07.25), Data-Platform-SRE (2025.07.05 - 2025.07.25), Patch-For-Review, Data-Engineering-Radar, Structured-Data-Backlog, Structured Data Engineering, Data-Engineering, CirrusSearch
dcausse added a parent task for T389643: [L] Adapt or transform image_suggestions_search_index_delta to allow creating one update per article: T372912: Migrate image recommendation to use page_weighted_tags_changed stream.
Apr 17 2025, 2:56 PM · Discovery-Search (2025.07.04 - 2025.07.25), Structured-Data-Backlog, CirrusSearch, Structured Data Engineering, Image-Suggestions
dcausse added a comment to T389643: [L] Adapt or transform image_suggestions_search_index_delta to allow creating one update per article.

I think this task should be done as part of T372912 which will involve some refactoring of the way the tags are shipped.
I suspect that the delta you generate could easily be grouped by page_id after they're computed.

Apr 17 2025, 2:56 PM · Discovery-Search (2025.07.04 - 2025.07.25), Structured-Data-Backlog, CirrusSearch, Structured Data Engineering, Image-Suggestions
dcausse changed Request URL from https://ca.wikipedia.org/w/api.php?action=query&format=*&cbbuilders=*&prop=*&formatversion=*&pageids=* to /w/index.php?action=edit&title=*&undo=*&undoafter=* on T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).
Apr 17 2025, 2:09 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse placed T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema) up for grabs.

Cirrus should now gracefully handle this exception, I took a quick look at EditPage but I'm not quite clear how to fail gracefully there.

Apr 17 2025, 2:07 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse updated the task description for T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).
Apr 17 2025, 2:05 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse closed T390853: Consider using upgradeMode=savepoint for the cirrus-streaming-updater as Resolved.
Apr 17 2025, 12:51 PM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch

Apr 16 2025

dcausse edited projects for T258278: Advanced search not working as expected with subpages in namespaces, added: Advanced-Search; removed CirrusSearch, Discovery-Search.

Tagging Advanced-Search because we introduced subpageof in CirrusSearch to workaround confusing behaviors of existing keywords like prefix:, see T159321 and T180495. There's possibly something to do on the UI to better guide the user using this field?
Search keywords that can escape the initial namespace filter have been quite confusing as well and we decided to not introduce new ones because there's no way to understand from the CirrusSearch perspective what's the actual intent of the user.

Apr 16 2025, 7:35 AM · Advanced-Search

Apr 15 2025

dcausse moved T390853: Consider using upgradeMode=savepoint for the cirrus-streaming-updater from In Progress to Needs Review on the Discovery-Search (2025.04.11 - 2025.05.02) board.
Apr 15 2025, 12:57 PM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch
dcausse claimed T390853: Consider using upgradeMode=savepoint for the cirrus-streaming-updater.
Apr 15 2025, 12:51 PM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch
dcausse moved T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema) from In Progress to Needs Review on the Discovery-Search (2025.04.11 - 2025.05.02) board.
Apr 15 2025, 12:50 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse closed T390665: wdqs2016 and 2017 not consuming updates as Resolved.
Apr 15 2025, 10:21 AM · Discovery-Search (2025.04.11 - 2025.05.02), Wikidata, Wikidata-Query-Service
dcausse closed T326311: Deletion of Lexemes appears to leak triples related to its forms and senses as Resolved.
Apr 15 2025, 10:01 AM · Discovery-Search (2025.04.11 - 2025.05.02), Wikidata
dcausse claimed T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).
Apr 15 2025, 9:51 AM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse moved T221709: scap service restarts for WDQS are inconsistent from To be Deployed to Done on the Discovery-Search (2025.04.11 - 2025.05.02) board.
Apr 15 2025, 8:53 AM · Discovery-Search (2025.04.11 - 2025.05.02), Data-Platform-SRE (2025.04.12 - 2025.05.02), Wikidata, Scap, Wikidata-Query-Service
dcausse closed T221709: scap service restarts for WDQS are inconsistent as Resolved.

tested wdqs & wcqs deploys and all the expected services got restarted successfully.

Apr 15 2025, 8:52 AM · Discovery-Search (2025.04.11 - 2025.05.02), Data-Platform-SRE (2025.04.12 - 2025.05.02), Wikidata, Scap, Wikidata-Query-Service

Apr 14 2025

dcausse closed T270106: Port query clicks datasets generation to airflow as Invalid.

already done

Apr 14 2025, 1:35 PM · Discovery-Search (2025.05.02 - 2025.05.23)
dcausse closed T355156: Upgrade the Flink version used by the Search Update Pipeline to fix bulk request size estimation issue as Declined.

we now use a custom elasticsearch sink, I doubt we'll want to go back to the one provided by flink

Apr 14 2025, 1:31 PM · Discovery-Search (2025.05.02 - 2025.05.23)
dcausse closed T311183: ores_predictions_daily DAG fails to overwrite files as Declined.

fetch_articletopic_prediction_thresholds has been removed

Apr 14 2025, 1:26 PM · Discovery-Search (2025.05.02 - 2025.05.23)
dcausse moved T383074: The CirrusSearch Saneitizer should support weighted_tags from elastic / cirrus to needs triage on the Discovery-Search board.
Apr 14 2025, 12:23 PM · Discovery-Search (2025.07.04 - 2025.07.25), CirrusSearch
dcausse claimed T391090: TypeError: array_flip(): Argument #1 ($array) must be of type array, null given.
Apr 14 2025, 8:54 AM · Discovery-Search (2025.05.02 - 2025.05.23), MW-1.44-notes (1.44.0-wmf.25; 2025-04-15), CirrusSearch
dcausse created T391792: Align search platform DAGs to DPE best practices.
Apr 14 2025, 7:48 AM · Patch-For-Review, Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch

Apr 11 2025

dcausse added a comment to T388549: Vector Search PoC.

I've been playing with grouping top results by cluster, this is at http://localhost:12222/clustered. Could be interesting in the context of diversity search.

Apr 11 2025, 5:41 PM · Discovery-Search (2025.04.11 - 2025.05.02)
dcausse added a comment to T388549: Vector Search PoC.

Wrote a small demo available on stat1009, you need a tunnel there with ssh -L12222:localhost:12222 stat1009.eqiad.wmnet and then open http://localhost:12222/.

Apr 11 2025, 2:04 PM · Discovery-Search (2025.04.11 - 2025.05.02)