Edit check/Tone Check

Group:	Editing + Machine Learning
Start:	2025-01-29
Team members:	Aiko Chou, Benoît Evellin, Diego Saez-Trumper, Georgios Kyziridis, Megan Neisler, David Chan, David Lynch, Esther Akinloose, Marielle Volz, Rummana Yasmeen, Nicolas Ayoub, Ed Sanders, Zoe,
Backlog:	#EditCheck
Lead:	Sucheta Salgaonkar, Peter Pelberg,
Management:	Valerie Puffet-Michel (engineering), Ilias Sarantopoulos

Translate this page; This page contains changes which are not marked for translation.

Tracked in Phabricator
Task T365301

This page holds the work the Editing Team is doing in collaboration with the Machine Learning Team to develop Tone Check (formerly Peacock Check).

Tone Check is an Edit Check that uses a language model to prompt people adding promotional, derogatory, or otherwise subjective language to consider "neutralizing" the tone of what they are writing.

A notable aspect of this project: Tone Check is the first Edit Check that uses artificial intelligence. In this case, a BERT language model initially selected and fine-tuned by the Research team to identify biased language within the new text people are attempting to publish to Wikipedia.

It is possible to test Tone check at various wikis .

The Edit check help page documents how Tone Check works and can be used.

To participate in and follow this project's development, we recommend adding this page to your watchlist.

Status

Last update: 15 July 2025

Currently being worked on

Defining the aspects of Tone Check will be configurable on-wiki
Exploring how Tone Check could help patrollers/reviewers make subtle forms of vandalism easier to detect.
Preparing for start of a controlled experiment of Tone Check at up to 5 Wikipedias

Feedback opportunities

User experience: in what ways do you think the Tone Check user experience could be improved? See testing instructions.
Patrollers/reviewers: how might Tone Check help help patrollers/reviewers make subtle forms of vandalism easier to detect? See conversation.
Configurability: what aspects of Tone Check will be configurable on-wiki? | T393820
Logging: what information will be logged (and made available to volunteers on-wiki) about when Tone Check becomes activated? | T395166, T395175

Planning

An A/B test to evaluate the impact of Tone Check.

Please visit Edit check#Status to gain a more granular understanding of where the development stands.

Objectives

Tone Check is intended to simultaneously:

Cause newer volunteers acting in good faith to add new information to Wikipedia's main namespace that is written in a neutral tone
Reduce the effort and attention experienced volunteers need to allocate towards ensuring text in the main namespace is written in a neutral tone.

Background

Writing in a neutral tone is an important part of Wikipedia's neutral point of view policy.

Writing in a neutral tone is also a practice many new volunteers find to be unintuitive. An October 2024 analysis of the new content edits newer volunteers^[1] published to English Wikipedia found:

56% of the new content edits newer volunteers published contained peacock words.
22% of the new content edits newer volunteers published that contained peacock words were reverted
New content edits containing peacock words were 46.7% more likely to be reverted than new content edits without peacock words

With the above in mind, Tone Check is meant to address two core issues:

Newcomers publishing edits to Wikipedia that contain promotional, derogatory, or otherwise subjective language because they lack the awareness that this kind of editing is not aligned with Wikipedia policies.
Experienced volunteers being burdened by the effort and attention they need to allocate towards patrolling and reviewing preventable damage made in good faith. This can come at the expense of identifying and addressing more subtle and complex forms of vandalism. Tone Check is designed to address these two issues by:
1. Offering new(er) volunteers feedback while they are editing so they can avoid unintentionally publishing edits that are likely to violate policies
2. Offering patrollers/reviewers deeper insight into the edits they are reviewing (and the intentions of the people publishing them) by logging the moderation feedback new(er) editors are being presented with, and the actions they do/do not take in response.

Design

User experience

Tone Check is a contextual intervention designed to equip new(er) volunteers editing in good faith with the awareness and know-how needed to ensure the tone of the text they are adding is aligned with Wikipedia policies. The Check is intentionally minimal, appears only when necessary, and aims to support the policies wikis have individually defined without blocking contributions.

When Tone Check is shown

Tone Check becomes activated when a contributor (who meets the configuration criteria communities will be able to set) adds new text that the underlying machine learning model – trained on Wikipedia edits – identifies as potentially biased or promotional. Specifically:

The check activates after the user finishes editing a paragraph and clicks or taps outside of it.
If the system detects promotional, derogatory, or otherwise subjective language, the Tone Check card appears.

The Edit Check card is displayed in the side container on both desktop and mobile devices. This lightweight, non-blocking format allows contributors to stay in flow while being made aware that something they have done warrants additional attention.

Placement and interaction

Tone Check appears at two key points in the editing workflow:

Mid-edit: If detected while the contributor is actively editing, Tone Check appears immediately in the side panel (mobile and desktop)
Pre-save: If not acted upon earlier, Tone Check appears again after the contributor clicks or taps “Publish changes”, during the proofreading step.

When Tone Check is activated, contributors see an Edit Check card with:

A short explanation that the flagged language is often revised by other editors for a more balanced tone.
A “Learn more” link to access additional context about Wikipedia’s tone policies and guidelines.
Two actions:
- Revise, return to editing and update the highlighted text.
- Decline, proceed as-is, after selecting a reason for not revising.
A disclaimer noting that a small language model was used to detect tone-related issues in the text.

Design intent and principles

Tone Check is grounded in the following design principles:

No firm rules: The tool suggests, but does not force, changes. Contributors can always choose to decline or proceed without editing.
Keep users in flow: The experience is embedded within the natural flow of editing and publishing, with lightweight prompts that avoid blocking or clutter.
Meet users where they are: Feedback is specific to the paragraph being edited and is framed using language easy to understand and grounded in Wikipedia norms.
Transparent: A clear disclaimer and consistent design patterns ensure transparency about how suggestions are generated.

Language selection

This section to include the languages we're prioritizing for initial experiment, the languages we're planning to scale to next, and why we came to select these languages. See phab:T388471.

Model

Tone Check leverages a Small Language Model (SLM) to detect the presence of promotional, derogatory, or otherwise subjective language. The SLM we are using is a BERT model, which is open source and presents its weights openly.

The model works by being fine-tuned on examples of Wikipedia revisions. It learns from instances where experienced editors have applied a specific template ("peacock") to flag tone violations, as well as instances where that template was removed. This process teaches the BERT model to identify patterns associated with appropriate and inappropriate tones based on Wikipedia's editorial standards. Under the hood, SLMs work by transforming text into high-dimensional vectors, which are then compared with the label, allowing the model to find a hyperplane that splits text into negative or positive cases.

The model was trained using 20,000 data points from 10 languages consisting of:

Positive examples: Revisions on Wikipedia that were marked with the "peacock" template, indicating a tone policy violation.
Negative examples: Revisions where the "peacock" template had been removed (signifying no policy violation).

Small Language Models (like the one being used for Tone Check) differ from Large Language Models (LLMs) in that the former are trained to adapt for particular use cases by learning from a focused dataset. In the case of Tone Check, this means the SLM learns directly from the expertise of experienced Wikipedia volunteers. Hence, they offer more explainability and flexibility compared to LLMs. Also SLMs requires significantly fewer computational resources than its larger counterparts.

LLMs on the other hand, are designed to work for general-purposes, with limited context and through a chat or prompting interface. LLMs require a huge amount of computation resources, and their behavior is difficult to explain, due the high amount of parameters involved.

Evaluating the model

Model

Before measuring the impact of the overall Tone Check experience through a controlled experiment in production, the team conducted two evaluations comparing the model's predictions to human-provided labels.

Outlined below is information about the purpose of each evaluation and what we found.

Internal evaluation

Goals

The first evaluation we conducted was internal, involving just the WMF product teams who were working on this feature. This review was meant to:

Evaluate whether the model aligned with human decisions often enough that we could consider its predictions reliable enough to move forward with a community-involved evaluation process
Figure out a prediction probability score threshold above which we could consider the model's predictions fairly accurate
Expose any edge cases or specific types of edits in which the model consistently does not perform well

Process

To assess the above, the team:

Created a list of 300 sample edits from English Wikipedia.
Assigned about 30 edits to each of the participants from our teams.
Asked each participant to go through the sample edits and indicate whether or not they contained promotional, derogatory, or otherwise subjective language that should be flagged by the Tone Check.
Compared the model's predictions to the human-provided labels.
Analyzed the cases where the model's predictions differed from the human-provided labels.

Findings

In English, false negatives (cases where the model predicts there isn't a tone check issue, but a human says there is) are very easily filtered out if we only return predictions with a probability score over 0.55.
In English, most false positives (cases where the model predicts that there is a tone check issue, but a human says there isn't) can be filtered out if we only return predictions with a probability score over 0.8.
There are some types of edits that the model has a hard time with - like edits that include a quote, where the quoted language is non-neutral in tone. In these cases, the model's predictions had a lower probability score.

Volunteer evaluation

The results of the internal evaluation gave us confidence to move forward with an external review involving experienced volunteers. We had enough positive examples (as defined above) to continue evaluating the model in English, French, Japanese, Portuguese, and Spanish.

Goals

This second review meant to:

Help us confirm that experienced volunteers agree with what the model identifies as promotional, derogatory, or otherwise subjective language
Evaluate whether the model's predictions about edits in French, Japanese, Portuguese, and Spanish are as reliable as they are about edits in English

Process

To assess the above, the team:

Created a list of 100 sample edits from each of the aforementioned Wikipedias.
Invited participants from each Wikipedia community to sign up and participate.
Provided the participants with a tool they could use to review and label each of the sample edits in the language(s) they were helping with.
Asked each participant to review and label at least 30 sample edits.
Compared the model's predictions to the human-provided labels.
Analyzed the cases where the model's predictions differed from the human-provided labels.

Findings

At the probability threshold the model would need to reach for a Tone Check to be shown (0.80) during an edit session, volunteers across the 5 languages who participated in the initial model review agreed with the model's detection of a tone issue in 95% of cases.

More details about the results from each of the 5 languages that we included in the initial volunteer review can be found in the table below.

Language	Revisions reviewed	Unique participants	High-level findings	Recommendation	How we made this recommendation
English	391	13	5% of reviews were false positives No false positives with a probability above 0.67	Continue conversations with volunteers about the potential risks of the feature and ideas for how we might mitigate and manage them.	It was rare for the model to flag an edit for a tone issue when volunteer reviewers said there wasn’t one - this only happened about 5% of the time. When it did happen, the model wasn’t very confident in its prediction. Its probability score was below the threshold (0.8) that we’d use in a real-world setting.
Spanish	285	9	3% of reviews were false positives with a probability of 0.8 or above and there were many “false positive” cases where the added text did actually contain biased language. 2 samples (with 3 total reviews) were positive for unclear reasons that needed to be investigated (probability 0.87 and 0.82 probability). Both were samples where only a phrase was added to a paragraph, and the added phrase did not contain non-neutral language.	Proceed with 0.8 probability score threshold and recommend es.wiki evaluates feature through an A/B test.	Most of the time, when the model confidently (with a probability score of ≥0.8) flagged an edit for a tone issue, the volunteers who reviewed that edit agreed - it was indeed a problem. In the small number of cases (3%) where the model confidently flagged an issue but at least one volunteer disagreed, there wasn’t a clear consensus among the volunteer reviewers. Even then, most volunteer reviewers still sided with the model. These cases often involved subjective or opinionated phrases, like “uno de los doctores mas importantes” (“one of the most important doctors”) and “desarrollo un paupérrimo torneo” (“he had a very poor tournament”).
Japanese	228	5	Fewer high-probability predictions overall, compared to other languages. Only 3% of samples saw probability over 0.8. Higher proportion of false positives (27%) but none had a probability of 0.8 or above. Two false positives had a probability score of 0.7 or above and in both cases, ≥1 other volunteer reviewers agreed with the model.	Proceed with 0.7 probability score threshold* and propose ja.wiki evaluates feature through an A/B test *This recommendation assumes ja.wiki is generally open to Tone Check; if it is more conservative, recommend higher threshold to minimize false positives	The model didn’t make very many high-confidence predictions for Japanese - only 3% of predictions had a probability score above 0.8, which was the threshold we had planned to use in the production experiment. Because so few predictions reached that level of confidence, we recommend lowering the threshold to 0.7 for Japanese. Importantly, none of the predictions with a score above 0.8 flagged a tone issue in edits that volunteers thought were fine. There were two cases where the model predicted a tone issue at probability scores of 0.7 and 0.75, and at least one human reviewer disagreed. But in both cases, there was no clear agreement among the reviewers - at least one reviewer agreed with the model’s assessment.
Portuguese	22	2	More reviews required for results to be conclusive No false positives (out of 22 reviews) All model predictions were above 0.8 probability or below 0.69 probability.	Proceed with 0.8 probability score threshold and propose pt.wiki evaluates feature through an A/B test. In parallel, recruit more volunteers to review model.	We only received 22 reviews, which wasn’t enough for a thorough evaluation. Of the edits reviewed, 50% had a model probability score above 0.8 - so we’re not too worried about recall. Additionally, in the cases where the model predicted a tone issue, human reviewers always agreed, meaning there were no false positives.
French	369	6	8% of reviews were false positives. 4% of reviews were false positives with a model probability score of 0.8 and above. In all the false positives with probability above 0.75, there were no examples where volunteers unanimously disagreed with the model.	Proceed with 0.8 probability score threshold and propose fr.wiki evaluates feature through an A/B test	Most of the time, when the model confidently flagged an edit for a tone issue, volunteers agreed - it was indeed a problem. In the small number of cases (4%) where the model confidently flagged an issue but at least one volunteer disagreed, there wasn’t a clear consensus among the human reviewers. Even then, many reviewers still agreed with the model. These cases often involved subjective or opinionated phrases, like “sa démarche picturale novatrice, paradoxale et indépendante” (“his innovative, paradoxical and independent approach to painting”).

User experience

The viability of Tone Check, like the broader Edit Check project, depends on the feature being able to simultaneously:

Reduce the moderation workload experienced volunteers carry
Increase the rate at which new(er) volunteers contribute constructively

To evaluate the extent to which Tone Check is effective at the above, the team will be conducting qualitative and quantitative experiments.

Below you will find:

Impacts the features introduced as part of the Edit Check are intended to cause and avert
Data we will use to help^[2] determine the extent to which a feature has/has not caused a particular impact
Evaluation methods we will use to gather the data necessary to determine the impact of a given feature

Desired Outcomes
ID	Outcome	Data	Evaluation Method(s)
1.	Key performance indicator: The quality of new content edits newcomers and Junior Contributors make in the main namespace will increase because a greater percentage of these edits will not contain peacock language	Proportion of all new content edits published without biased language Proportion of new content edits that are not reverted.	A/B test, qualitative feedback (e.g. talk page discussions, false positive reporting)
2.	Key performance indicator: Newcomers and Junior Contributors will experience Peacock Check as encouraging because it will offer them more clarity about what is expected of the new information they add to Wikipedia	Proportion of new content edits started (defined as reaching point that peacock check was or would be shown) that are successfully published (not reverted).	A/B test, qualitative feedback (e.g. usability tests, interviews, etc.)
3.	New account holders will be more likely to publish an unreverted edit to the main namespace within 24 hours of creating an account because they will be made aware the new text they're attempting to publish needs to be written in a neutral tone, when they don't first think/know to write in this way themselves	Proportion of newcomers who publish ≥1 constructive edit in the Wikipedia main namespace on a mobile device within 24 hours of creating an account (constructive activation).	A/B test
4.	Newcomers and Junior Contributors will be more aware of the need to write in a neutral tone when contributing new text because the visual editor will prompt them to do so in cases where they have written text that contains peacock language.	The proportion of newcomers and Junior Contributors that publish at least one new content edit that does not contain peacock language.	A/B test
5.	Newcomers and Junior Contributors will be more likely to return to publish a new content edit in the future that does not include peacock language because Peacock Check will have caused them to realize when they are at risk of of this not being true.	Proportion of newcomers and Junior Contributors that publish an edit Peacock Check was activated within and successfully return to make an unreverted edit to a main namespace during the identified retention period. Proportion of newcomers and Junior Contributors that publish an edit Peacock Check was activated within and return to make a new content edit without non-neutral language to a page in the main namespace during the identified retention period.	A/B test

Undesirable Outcomes
ID	Outcome	Data	Evaluation Method(s)
1.	Edit quality decreases	Proportion of published edits that add new content and are still reverted within 48hours. Note: Will include a breakdown of the revert rate of published new content edit edits with and without non-neutral language.	A/B test and leading indicators analysis
2.	Edit completion rate drastically decreases	Proportion of new content edits started (defined as reaching point that peacock check was or would be shown) that are published. Note: Will include breakdown by the number of checks shown to identify if lower completion rate corresponds with higher number of check shown.	A/B test and leading indicators analysis
3.	Edit abandonment rate drastically increases	Proportion of edits that are started (`event.action` = `init`) that are successfully published (`event.action` = `saveSuccess`).	A/B test and leading indicators analysis
5.	People shown Tone Check are blocked at higher rates	Proportion of contributors blocked after publishing an edit where Tone Check was shown compared to contributors not shown the Tone Check	A/B test and leading indicators analysis
6.	High false positive rates	Proportion of contributors that decline revising the text they’ve drafted and indicate that it was irrelevant.	A/B test, leading indicators analysis, and qualitative feedback

Findings

This section will include the findings from the experiments described in #Evaluating impact.

Configurability

Tone Check will be implemented – like all Edit Checks – in a way that enables volunteers to explicitly configure how it behaves and who Tone Check is made available to.

Configurability happens on a per project basis so that volunteers can ensure the Tone Check experience is aligned with local policies and conventions.

The particular facets of Tone Check that will be community configurable are still being decided. If there are particular aspects of Tone Check that you think need to be configured on-wiki, we ask that you share what you are thinking in T393820 or on the talk page.

Tone Check On-wiki Configuration
ID	Configurable facet	Potential value(s)	Default value	Notes

Timeline


Time	Activity	Status	Notes

Peter to populate this section with a high-level timeline of the project: background analysis, initial model development, community conversations/consultations, usability study, pre-mortem, internal model evaluation, volunteer model evaluation, development, pilot experiment, etc.

History

Tone Check, and the broader Edit Check initiative, is a response to a range of community conversations and initiatives. Some which include those listed below. For more historical context, please see Edit check#Background.

Editing Team Community Conversation (April 2025)
New page patrol/Reviewers (en.wiki) (April 2025)
ESEAP Strategy Summit 2025
Wikimedia CEE annual planning conversation (April 2025)
Afrika Baraza meeting (May 2025)
Supporting moderators at the Wikimedia Foundation (August 2023)
Editing the Wiki Way software and the future of editing (August 2021)
Existing maintenance templates
- ar.wiki: تحيز, تعارض مصالح, تعظيم, دعاية, رأي منحاز, عبارة محايدة؟, غير متوازن, مصدر منحاز, وجهة نظر معجب, أهمية مبالغ بها ,استشهاد منشور ذاتي،تحيز,تعارض مصالح,تعظيم ،تلاعب بالألفاظ ,حيادية خريطة , عاية, رأي منحاز,سيرة شخصية ذاتية ,،سيرة شخصية نشر ذاتي فقط ,،عبارة محايدة؟ ,،غير متوازن , مبهمة , ،مساهمة مدفوعة غير مصرح عنها ,،مصادر متحزبة ,مصدر منحاز, ،نشر ذاتي سطري , نظرية هامشية ,وجهات نظر قليلة , وجهة نظر معجب
- cs.wiki: https://cs.wikipedia.org/wiki/%C5%A0ablona:NPOV, https://cs.wikipedia.org/wiki/%C5%A0ablona:Vyh%C3%BDbav%C3%A1_slova
- de.wiki: https://de.wikipedia.org/wiki/Vorlage:Neutralität
- en.wiki: https://en.wikipedia.org/wiki/Category:Neutrality_templates
- es.wiki: https://es.wikipedia.org/wiki/Plantilla:No_neutralidad, https://es.wikipedia.org/wiki/Plantilla:Promocional, https://es.wikipedia.org/wiki/Plantilla:PVfan, https://es.wikipedia.org/wiki/Plantilla:Globalizar
- fa.wiki: https://fa.wikipedia.org/wiki/%D8%B1%D8%AF%D9%87:%D8%A7%D9%84%DA%AF%D9%88:%D8%AF%DB%8C%D8%AF%DA%AF%D8%A7%D9%87_%D8%A8%DB%8C%E2%80%8C%D8%B7%D8%B1%D9%81
- fr.wiki: Non-neutre, Désaccord de neutralité, Section non neutre, Dithyrambe, Curriculum vitae, Catalogue de vente, Promotionnel, section promotionnelle, Name dropping, Passage promotionnel, Passage lyrique, Passage non neutre
- id.wiki: Tak netral, Berbunga-bunga, Iklan, Seperti resume, Fanpov, Peacock, Autobiografi, Konflik kepentingan
- it.wiki: https://it.wikipedia.org/wiki/Template:P
- ja.wiki: Template:観点, Template:宣伝, Template:大言壮語
- lv.wiki: tps://lv.wikipedia.org/wiki/Veidne:Pov,https://lv.wikipedia.org/wiki/Veidne:Konfl,https://lv.wikipedia.org/wiki/Veidne:Autobiogr%C4%81fija
- no.wiki: https://no.wikipedia.org/wiki/Mal:Objektivitet-seksjon, https://no.wikipedia.org/wiki/Mal:Objektivitet
- pl.wiki: https://pl.wikipedia.org/wiki/Szablon:Dopracowa%C4%87 {{Dopracować{{!}}param_name=...}} (Template Dopracować is a general template for issues, being precised with params, relevant parameters: pov, neutralność, reklama, spam, polonocentryzm, povpol, zależne, wieszak, źródła promocyjne, źródła zależne – case insensitive)
- ro.wiki: https://ro.wikipedia.org/wiki/Format:PDVN, https://ro.wikipedia.org/wiki/Format:Jv, https://ro.wikipedia.org/wiki/Format:Ton_nepotrivit, also the template https://ro.wikipedia.org/wiki/Format:Problemearticol with the parameters ton, ton nepotrivit or PDVN
- ru.wiki: ttps://ru.wikipedia.org/wiki/Шаблон:Проверить_нейтральность, https://ru.wikipedia.org/wiki/Шаблон:Конфликт_интересов, https://ru.wikipedia.org/wiki/Шаблон:Реклама, https://ru.wikipedia.org/wiki/Шаблон:Автобиография, https://ru.wikipedia.org/wiki/Шаблон:Недостаточно_критики, https://ru.wikipedia.org/wiki/Шаблон:Нейтральность_раздела_под_сомнением, https://ru.wikipedia.org/wiki/Шаблон:Нейтральность%3F (inline one)
- zh.wik: , Advert, Fanpov, Newsrelease, Review, Tone, Unencyclopedic, Trivia, Autobiography, COI, BLPdispute, POV, Copy edit

Edit Check

This initiative sits within the larger Edit Check project – an effort to meet people while they are editing with actionable feedback about Wikipedia policies.

Edit Check is intended to simultaneously deliver impact for two key groups of people.

Experienced volunteers who need:

Relief from repairing preventable damage
Capacity to confront complexity

New(er) volunteers who need:

Actionable feedback
Compelling opportunities to contribute
Clarity about what is expected of them

FAQ

Why does Tone Check uses Artificial intelligence (AI)?: AI increases Wikipedia projects' ability to detect promotional/non-neutral language before people publish it.
Which AI model do you use?: We use an open-source model called BERT. The model we use is not a large language model (LLM). It is actually a smaller language model which the Machine learning team prefers, because it tells us how probable each of its predictions is, and it's easier to adapt to our custom data.
What language(s) does/will Tone Check support?: Initially, Tone Check will support the following five languages: English, French, Japanese, Spanish, and Portuguese. Please see T388471#10781906 for details about how and these languages were prioritized to start. Note: the goal remains for Tone Check to support all languages.
What – if any – on-wiki logging will be in place so volunteers can see when Tone Check was shown?: To start, we're planning for an edit tag to be appended to all edits in which ≥1 Tone Check is shown.; This approach follows what was implemented for Reference Check.
Why did you not implement Tone Check as an Abuse filter?
What will we do to ensure Tone Check does not cause people to publish more subtle forms of promotional, derogatory, or otherwise subjective language that is more difficult for the model and people to detect?
What control will volunteers have over how Tone Check behaves and who it is available to?: ADD questions from the internal pre-mortem we conducted.
What control do volunteers have over how the model behaves?

References

↑ "Newer volunteers" refers to people who have published ≤100 cumulative edits.
↑ Emphasis on "help" seeing as how all decisions will depend on a variety of data, all of which need to be weighted and considered to make informed decisions.

[1] "Newer volunteers" refers to people who have published ≤100 cumulative edits.

[2] Emphasis on "help" seeing as how all decisions will depend on a variety of data, all of which need to be weighted and considered to make informed decisions.

[1]

[2]