User story & summary:
As a Wikimedian, I want the "Add a link" task to follow the Manual of Style (MOS) guidelines and the specific norms of my wiki. In particular, I want to prevent recurring problematic link suggestions so that newcomers aren't repeatedly prompted to link words or phrases that are inappropriate, unnecessary, or against community standards.
Research Goals
There are multiple ways to improve "Add a link" to better align with wiki-specific guidelines, each with trade-offs. This timeboxed research spike will allow engineers to:
- Explore possible solutions.
- Assess feasibility, pros, and cons.
- Identify major dependencies (e.g., Research or Machine Learning team support).
- Provide short-term recommendations (for Q4) and long-term considerations (for the next fiscal year).
Background / User Problem:
This task is critical because the English Wikipedia community identified compliance with MOS:OL as a blocker to expanding the "Add a link" feature to more editors:
Please implement some means of complying more broadly with MOS:OL. :meta:Research:Link recommendation model for add-a-link structured task § Hard-coded rules for (not) linking is a great start, but continued non-compliant suggestions are what in my estimation will create the most community pushback, since they exhibit all of the following: contravene longstanding community consensus; generate maintenance burden; forewarned of; already documented during trial period; obviously solveable in software somehow. I'm not sure I'd be comfortable going over 10% deployment before this is addressed, but others might feel differently.
from enwiki discussion: Wikipedia talk:Growth Team features
This request is also emerging at several other wikis:
Just a few days ago I received another suggestion for "Lika", which I have denied for several years... What the function needs is a list of words that should never be suggested.
from svwiki discussion: Wikipedia:Bybrunnen
Potential Approaches
There are various ways we can consider making improvements, and they all have pros and cons. This timeboxed research spike is meant to give engineers time to dive into this issue more deeply and help ensure we are considering all options, and start to better understand the pros and cons and time investment for various approaches.
1: Update hardcoded rules and retrain the model
- Expand and refine the existing hardcoded list of rules for not linking.
- Retrain the Link Suggestions model on wikis that have reported issues.
- Example task: T386867: Add a Link: add "do not link" rule for country names (Q6256) on English Wikipedia
2. Leverage Wikidata-based filtering
- Adapt the logic used in the current hardcoded rules.
- Instead of static rules, allow communities to configure a list of P31 (instance-of) properties from Wikidata to exclude.
3. Community-configurable "Never Link" list
- Enable wikis to define a list of terms that should never be suggested.
- Example: Swedish Wikipedia could add "Lika," while English Wikipedia might exclude all country names.
4. Model retraining based on rejection data
- Ensure the Link Suggestion model is retrained regularly
- Ensure the Link Suggestion model learns from past rejected and/or reverted link suggestions.
5. Other possible solutions
- [Insert additional ideas—there may be other creative approaches worth considering!]
Acceptance Criteria:
- Document the problem space, including feasibility, trade-offs, and implementation complexity.
- Identify whether this work requires external team support (e.g., Research, Machine Learning).
- Recommend next steps for Q4 and the next fiscal year.
Timeboxed research spike: 5 days.