Page MenuHomePhabricator

πŸš€πŸ” Add more basic functionality in simple search endpoint
Closed, ResolvedPublic13 Estimated Story Points

Description

In addition to the basic functionality covered in https://phabricator.wikimedia.org/T383132, the endpoint should also cover the following functionality:

  • Add matched data element to the response - to be investigated (timeboxed to max 20 hours)
  • Language fallback of labels and aliases as in the UI

Example for matched data:

GET [...]/rest.php/wikibase/v0/search/items?language=en&q=spud

      "id": "Q123",
      "label": { "language": "en", "value": "potato" }
      "description": { "language": "en", "value": "staple food" }
      "match": {
        "type": "alias",
        "language": "en",
        "text": "spud"
      }

Example for label language fallback:

GET [...]/rest.php/wikibase/v0/search/items?language=en&q=Douglas%20Adams

      "id": "Q123",
      "label": { "language": "mul", "value": "Douglas Adams" }
      "description": { "language": "en", "value": "English science fiction writer and humorist" }
      "match": {
        "type": "label",
        "language": "mul",
        "text": "Douglas Adams"
      }

Example for alias language fallback:

GET [...]/rest.php/wikibase/v0/search/items?language=en&q=Douglas%20N.%20Adams

      "id": "Q123",
      "label": { "language": "mul", "value": "Douglas Adams" }
      "description": { "language": "en", "value": "English science fiction writer and humorist" }
      "match": {
        "type": "alias",
        "language": "mul",
        "text": "Douglas N. Adams"
      }

Task breakdown:

  • Create an ItemSearchEngine implementation that can support language fallback and match data
  • Implement match data
    • adjust the use case and the ItemSearchResult model
    • adjust the two ItemSearchEngine accordingly
  • Implement language fallback
    • adjust the use case and the ItemSearchResult model
    • adjust the two ItemSearchEngine accordingly

Event Timeline

Ifrahkhanyaree_WMDE renamed this task from Additional functionality in simple search to Additional basic functionality in simple search.Feb 12 2025, 4:19 PM
Ifrahkhanyaree_WMDE updated the task description. (Show Details)
Ifrahkhanyaree_WMDE renamed this task from Additional basic functionality in simple search to Add more basic functionality in simple search endpoint.Feb 12 2025, 4:27 PM
Ifrahkhanyaree_WMDE updated the task description. (Show Details)

Note from refinement: We'll estimate this after the investigation ticket is done and the ticket will be split into two during task breakdown

Dima_Koushha_WMDE renamed this task from Add more basic functionality in simple search endpoint to πŸš€πŸ” Add more basic functionality in simple search endpoint.Mar 5 2025, 3:53 PM

One comment for product verification, @Ifrahkhanyaree_WMDE: when an Item is found by one of its aliases and has no label in the search language (or any of the fallback languages), then the label field in the search result will currently be null and the matching alias text will be shown in the match field. Let's discuss if we really want a label field in the search result and not something that can contain either a label or an alias, instead. Conceptually, this would be the "title" or "name" or "headline" of the search result, rather than an Item label.

(see T389359)

Works grand but I have a clarification question:

I searched with this endpoint: https://wikidata.beta.wmflabs.org/w/rest.php/wikibase/v0/search/properties?language=de&q=rest and the result I got was -

{
	"results": [
		{
			"id": "P253148",
			"display-label": {
				"language": "en",
				"value": "REST API Test Property"
			},
			"description": {
				"language": "en",
				"value": "Property used for testing Wikibase REST API"
			},
			"match": {
				"type": "label",
				"language": "en",
				"text": "REST API Test Property"
			}
		},
		{
			"id": "P253144",
			"display-label": {
				"language": "de",
				"value": "de-label"
			},
			"description": {
				"language": "en",
				"value": "description en"
			},
			"match": {
				"type": "label",
				"language": "en",
				"text": "REST API test property v2"
			}
		}
	]
}

The first result makes perfect sense, I'm just trying to understand why it would give me the second option? I guess this isn't in our realm of influence but I'm somehow not convinced that that result needs to be there?

These results look okay to me. What is it about the second result that makes you think it shouldn't be included? Both results are of a Property that has a label (in English as I assume there are no German labels/aliases that contain "REST") that contains the search term "rest". display-label and description will try to use the requested language, but if no label/description/alias is found in the requested language, then the language fallback chain is followed.

So I asked for "rest" in German and because there wasn't any, it went to the language fallback chain in the first one and matched the en label and gave me that - which is fine

In the second one, there IS a label in German, but not the one I'm looking for so it then still defaults to language fallback? Why is that expected? (and again maybe I'm not fully aware of the language fallback processes we have but seemed a little odd)

Ahh, understood!

The "In more detail" section of my comment T389054#10661913 might help understand what is happening for these results to be shown. In particular, that labels in all languages are first searched, before then scoring the results based on more language specific fields. So when there aren't many Properties to search (like on beta wikidata) you're likely to see these type of results. There just aren't enough Properties that match the search term in all languages for these less relevant Properties to be hidden towards the end of the results.

If we build our own Elasticsearch query, then we can influence this behaviour, although I imagine it is done like this for good reason, so we might want to have a conversation with the WMF Search team before we decide to change it. I've heard that it's better to show some (potentially less relevant) results, than no results at all.