Provide AbuseFilter condition for revertrisk threshold
Open, In Progress, Needs TriagePublic
Actions

Description

Context

We'd like to be able to invoke AbuseFilter actions if an edit doesn't pass the "likely to be reverted" revertrisk threshold.

Proposal

Introduce an AbuseFilter condition that utilizes the pre-save revertrisk API (T356102: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context) for the language agnostic model.

We'd have to exclude page creation scenarios, because the revert risk model doesn't handle those.

Consequences

Abuse mitigation tooling can invoke actions before an edit is saved based on a revert risk score

Details

Other Assignee: kostajh

Subject	Repo	Branch	Lines +/-
Reapply "ores: Disable AbuseFilter integration by default"	operations/mediawiki-config	master	+0 -4
Map pre-save RR scores to predefined values	mediawiki/extensions/ORES	wmf/1.45.0-wmf.6	+76 -23
ores: Disable AbuseFilter integration by default	operations/mediawiki-config	master	+4 -0
Map pre-save RR scores to predefined values	mediawiki/extensions/ORES	master	+76 -23
ORES: Allow using RRML for pre-save revert risk detection	operations/mediawiki-config	master	+53 -0
[WIP] Add AbuseFilter variable for revertrisk score	mediawiki/extensions/ORES	master	+224 -4
Set ORESDeveloperSetup to false by default	mediawiki/extensions/ORES	wmf/1.45.0-wmf.5	+1 -1
Set ORESDeveloperSetup to false by default	mediawiki/extensions/ORES	master	+1 -1
Add revertrisk_score AbuseFilter variable	mediawiki/extensions/ORES	master	+501 -6
LiftWingService: Add method to evaluate pre-save revert risk	mediawiki/extensions/ORES	master	+422 -5
LiftWingService: Unify request creation	mediawiki/extensions/ORES	master	+114 -53
LiftWingService: Add tests	mediawiki/extensions/ORES	master	+386 -21
zuul: Add AbuseFilter as phan & test dependency for ORES	integration/config	master	+2 -0

Related Objects
Search...

Status	Assigned	Task
		Restricted Task
Resolved	kostajh	T294511 2021 Security Team wikireplicas audit
Declined	None	T284948 Raw IPs of logged-out users disclosed in wiki-replicas
In Progress	Niharika	T324492 Temporary accounts - MVP
Open	None	T357776 [Epic] Mitigate abilities to abuse temporary accounts
In Progress	mszabo	T364705 Provide AbuseFilter condition for revertrisk threshold
Resolved	achou	T356102 Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 13 2024, 7:37 AM

kostajh added a subtask: T356102: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context.May 13 2024, 7:37 AM

kostajh mentioned this in T356102: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context.

kostajh added a parent task: T357776: [Epic] Mitigate abilities to abuse temporary accounts.

Marking as stalled on T356102: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context

Per https://phabricator.wikimedia.org/T356102#9935357, the feature is now usable on ml-staging.

kostajh added a project: Temporary accounts (Create/update essential tools/anti-abuse management).Jul 3 2024, 10:08 AM

kostajh claimed this task.Jul 3 2024, 2:09 PM

Change #1051837 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/ORES@master] [WIP] Add AbuseFilter variable for revertrisk score

https://gerrit.wikimedia.org/r/1051837

gerritbot added a project: Patch-For-Review.Jul 3 2024, 8:26 PM

Change #1051838 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] zuul: Add AbuseFilter as phan & test dependency for ORES

https://gerrit.wikimedia.org/r/1051838

Change #1051838 merged by jenkins-bot:

[integration/config@master] zuul: Add AbuseFilter as phan & test dependency for ORES

https://gerrit.wikimedia.org/r/1051838

kostajh mentioned this in T123178: [Spike] Investigate building a hook for abuse filter.Jul 8 2024, 9:54 AM

kostajh merged a task: T123178: [Spike] Investigate building a hook for abuse filter.

kostajh added subscribers: • Ladsgroup, DannyS712, Stang and 10 others.

kostajh added a project: Trust and Safety Product Sprint.Aug 27 2024, 11:56 AM

kostajh reassigned this task from kostajh to mszabo.Aug 27 2024, 1:40 PM

kostajh edited projects, added Trust and Safety Product Sprint (Sprint Beatboxing (Sept 16-27)); removed Trust and Safety Product Sprint.

kostajh updated Other Assignee, added: kostajh.

calbon mentioned this in T371398: Goal 4: Support product teams in deploying production models..Aug 27 2024, 2:28 PM

kostajh closed subtask T356102: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context as Resolved.Sep 13 2024, 11:08 AM

kostajh edited projects, added Trust and Safety Product Sprint; removed Trust and Safety Product Sprint (Sprint Beatboxing (Sept 16-27)).Sep 16 2024, 10:24 AM

kostajh added a project: WE4.2 Anti-abuse.Sep 17 2024, 6:59 AM

Pppery edited projects, added Patch-Needs-Improvement; removed Patch-For-Review.Nov 10 2024, 7:27 AM

Pppery awarded a token.

Titore subscribed.Mar 10 2025, 1:01 AM

kostajh moved this task from Backlog to 2024-2025 Q4 on the WE4.2 Anti-abuse board.Mar 19 2025, 10:15 AM

kostajh updated the task description. (Show Details)May 16 2025, 7:47 AM

Tentatively adding to the next sprint, so we could try to get this done for use alongside T354599: [EPIC] WE4.2.14b Provide IP reputation variables in AbuseFilter

kostajh updated the task description. (Show Details)May 21 2025, 11:46 AM

Change #1152267 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ORES@master] LiftWingService: Add tests

https://gerrit.wikimedia.org/r/1152267

gerritbot added a project: Patch-For-Review.May 30 2025, 1:48 PM

Restricted Application removed a project: Patch-Needs-Improvement. · View Herald TranscriptMay 30 2025, 1:48 PM

Change #1152268 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ORES@master] LiftWingService: Unify request creation

https://gerrit.wikimedia.org/r/1152268

Change #1152269 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ORES@master] LiftWingService: Add method to evaluate pre-save revert risk

https://gerrit.wikimedia.org/r/1152269

Change #1152270 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ORES@master] Add revertrisk_score AbuseFilter variable

https://gerrit.wikimedia.org/r/1152270

Change #1152770 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[operations/mediawiki-config@master] ORES: Allow using RRML for pre-save revert risk detection

https://gerrit.wikimedia.org/r/1152770

mszabo changed the task status from Open to In Progress.Jun 2 2025, 4:25 PM

mszabo moved this task from Priority Backlog to Needs Review on the Trust and Safety Product Sprint (Sprint Carrot Cake (May 26 - June 13)) board.

Change #1152267 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] LiftWingService: Add tests

https://gerrit.wikimedia.org/r/1152267

Change #1152268 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] LiftWingService: Unify request creation

https://gerrit.wikimedia.org/r/1152268

ReleaseTaggerBot added a project: MW-1.45-notes (1.45.0-wmf.5; 2025-06-10).Jun 3 2025, 4:00 PM

Change #1152269 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] LiftWingService: Add method to evaluate pre-save revert risk

https://gerrit.wikimedia.org/r/1152269

Change #1152270 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] Add revertrisk_score AbuseFilter variable

https://gerrit.wikimedia.org/r/1152270

Change #1155235 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[operations/mediawiki-config@master] ores: Disable AbuseFilter integration by default

https://gerrit.wikimedia.org/r/1155235

Change #1155247 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ORES@master] Set ORESDeveloperSetup to false by default

https://gerrit.wikimedia.org/r/1155247

Change #1155247 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] Set ORESDeveloperSetup to false by default

https://gerrit.wikimedia.org/r/1155247

Change #1155276 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ORES@wmf/1.45.0-wmf.5] Set ORESDeveloperSetup to false by default

https://gerrit.wikimedia.org/r/1155276

Change #1155235 merged by jenkins-bot:

[operations/mediawiki-config@master] ores: Disable AbuseFilter integration by default

https://gerrit.wikimedia.org/r/1155235

Change #1155276 merged by jenkins-bot:

[mediawiki/extensions/ORES@wmf/1.45.0-wmf.5] Set ORESDeveloperSetup to false by default

https://gerrit.wikimedia.org/r/1155276

Mentioned in SAL (#wikimedia-operations) [2025-06-10T16:50:59Z] <mszabo@deploy1003> Started scap sync-world: Backport for [[gerrit:1155276|Set ORESDeveloperSetup to false by default (T364705)]], [[gerrit:1155235|ores: Disable AbuseFilter integration by default (T364705)]], [[gerrit:1155280|tests: Run only defered updates on LinkRecommendationUpdaterTest]]

Mentioned in SAL (#wikimedia-operations) [2025-06-10T16:55:05Z] <mszabo@deploy1003> Started scap sync-world: Backport for [[gerrit:1155276|Set ORESDeveloperSetup to false by default (T364705)]], [[gerrit:1155235|ores: Disable AbuseFilter integration by default (T364705)]], [[gerrit:1155280|tests: Run only defered updates on LinkRecommendationUpdaterTest]]

Mentioned in SAL (#wikimedia-operations) [2025-06-10T16:59:17Z] <mszabo@deploy1003> mszabo: Backport for [[gerrit:1155276|Set ORESDeveloperSetup to false by default (T364705)]], [[gerrit:1155235|ores: Disable AbuseFilter integration by default (T364705)]], [[gerrit:1155280|tests: Run only defered updates on LinkRecommendationUpdaterTest]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

ReleaseTaggerBot edited projects, added MW-1.45-notes (1.45.0-wmf.6; 2025-06-17); removed MW-1.45-notes (1.45.0-wmf.5; 2025-06-10).Jun 10 2025, 5:00 PM

Mentioned in SAL (#wikimedia-operations) [2025-06-10T17:10:11Z] <mszabo@deploy1003> Finished scap sync-world: Backport for [[gerrit:1155276|Set ORESDeveloperSetup to false by default (T364705)]], [[gerrit:1155235|ores: Disable AbuseFilter integration by default (T364705)]], [[gerrit:1155280|tests: Run only defered updates on LinkRecommendationUpdaterTest]] (duration: 15m 06s)

Change #1051837 abandoned by Kosta Harlan:

[mediawiki/extensions/ORES@master] [WIP] Add AbuseFilter variable for revertrisk score

Reason:

See I0ccf97880001c3d0c81c612bb98f1da5ab9bb452

https://gerrit.wikimedia.org/r/1051837

@mszabo and I discussed making the following changes:

rename the variable to revertrisk_level
the valid values for the variable are high or null
the variable will be high if the revertrisk score is above the threshold defined in wgOresFiltersThresholds['revertrisklanguageagnostic']['min']
the variable is only available if wgOresFiltersThresholds['revertrisklanguageagnostic']['min'] is defined (currently, this is the case for ~19 wikis)
we will *not* use RRML endpoint as we don't have thresholds defined for RRML
we will *not* return low for the revertrisk_level because we don't have the thresholds defined

kostajh edited projects, added Trust and Safety Product Sprint (Sprint Baklava (June 16 - July 4)); removed Trust and Safety Product Sprint (Sprint Carrot Cake (May 26 - June 13)).Jun 16 2025, 6:13 AM

kostajh moved this task from Priority Backlog to Needs Review on the Trust and Safety Product Sprint (Sprint Baklava (June 16 - July 4)) board.

Change #1160196 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/ORES@master] Map pre-save RR scores to predefined values

https://gerrit.wikimedia.org/r/1160196

Change #1160196 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] Map pre-save RR scores to predefined values

https://gerrit.wikimedia.org/r/1160196

Change #1162998 had a related patch set uploaded (by Kosta Harlan; author: Máté Szabó):

[mediawiki/extensions/ORES@wmf/1.45.0-wmf.6] Map pre-save RR scores to predefined values

https://gerrit.wikimedia.org/r/1162998

Change #1163004 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] Revert "ores: Disable AbuseFilter integration by default"

https://gerrit.wikimedia.org/r/1163004

Change #1162998 merged by jenkins-bot:

[mediawiki/extensions/ORES@wmf/1.45.0-wmf.6] Map pre-save RR scores to predefined values

https://gerrit.wikimedia.org/r/1162998

Mentioned in SAL (#wikimedia-operations) [2025-06-23T20:14:27Z] <kharlan@deploy1003> Started scap sync-world: Backport for [[gerrit:1162998|Map pre-save RR scores to predefined values (T364705)]], [[gerrit:1161950|Fix password handling for non-existent users (T395372 T397262)]]

Stashbot mentioned this in T397262: PHP Deprecated: preg_match(): Passing null to parameter #2 ($subject) of type string is deprecated.Mon, Jun 23, 8:14 PM

Mentioned in SAL (#wikimedia-operations) [2025-06-23T20:38:46Z] <kharlan@deploy1003> kharlan, tgr: Backport for [[gerrit:1162998|Map pre-save RR scores to predefined values (T364705)]], [[gerrit:1161950|Fix password handling for non-existent users (T395372 T397262)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-06-23T20:58:57Z] <kharlan@deploy1003> Finished scap sync-world: Backport for [[gerrit:1162998|Map pre-save RR scores to predefined values (T364705)]], [[gerrit:1161950|Fix password handling for non-existent users (T395372 T397262)]] (duration: 44m 29s)

Change #1163004 merged by jenkins-bot:

[operations/mediawiki-config@master] Reapply "ores: Disable AbuseFilter integration by default"

https://gerrit.wikimedia.org/r/1163004

Mentioned in SAL (#wikimedia-operations) [2025-06-23T21:01:54Z] <kharlan@deploy1003> Started scap sync-world: Backport for [[gerrit:1163004|Reapply "ores: Disable AbuseFilter integration by default" (T364705)]], [[gerrit:1155725|Configure event stream for IP auto-reveal instrument (T387600)]], [[gerrit:1160157|Reapply "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204)]]

Mentioned in SAL (#wikimedia-operations) [2025-06-23T21:04:28Z] <kharlan@deploy1003> kharlan, tgr, tchanders: Backport for [[gerrit:1163004|Reapply "ores: Disable AbuseFilter integration by default" (T364705)]], [[gerrit:1155725|Configure event stream for IP auto-reveal instrument (T387600)]], [[gerrit:1160157|Reapply "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now

Mentioned in SAL (#wikimedia-operations) [2025-06-23T21:16:46Z] <kharlan@deploy1003> Finished scap sync-world: Backport for [[gerrit:1163004|Reapply "ores: Disable AbuseFilter integration by default" (T364705)]], [[gerrit:1155725|Configure event stream for IP auto-reveal instrument (T387600)]], [[gerrit:1160157|Reapply "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204)]] (duration: 14m 51s)

kostajh moved this task from Needs Review to Needs QA on the Trust and Safety Product Sprint (Sprint Baklava (June 16 - July 4)) board.Tue, Jun 24, 6:53 AM

kostajh added a project: User-notice.Tue, Jun 24, 8:42 AM

ReleaseTaggerBot edited projects, added MW-1.45-notes (1.45.0-wmf.8; 2025-07-01); removed MW-1.45-notes (1.45.0-wmf.6; 2025-06-17).Tue, Jun 24, 3:02 PM

ReleaseTaggerBot edited projects, added MW-1.45-notes (1.45.0-wmf.6; 2025-06-17); removed MW-1.45-notes (1.45.0-wmf.8; 2025-07-01).Tue, Jun 24, 4:01 PM

Re: Tech News/User-notice - What wording would you suggest for the entry, and When should it be included (I assume this next edition)? Thanks!

Quiddity moved this task from To Triage to Not ready to announce on the User-notice board.Thu, Jun 26, 11:17 PM

kostajh mentioned this in T398291: AI/ML Infrastructure Request: Expand ORES-enabled RevertRisk filters deployment to all wikis, excluding Commons and Wikidata.Tue, Jul 1, 10:36 AM

kostajh edited projects, added Trust and Safety Product Sprint (Sprint Cannoli (July 7 - July 25)); removed Trust and Safety Product Sprint (Sprint Baklava (June 16 - July 4)).Mon, Jul 7, 9:42 AM

kostajh moved this task from Priority Backlog to Needs QA on the Trust and Safety Product Sprint (Sprint Cannoli (July 7 - July 25)) board.

UOzurumba moved this task from Not ready to announce to Announce in next Tech/News on the User-notice board.Wed, Jul 9, 6:53 PM

In T364705#10952467, @Quiddity wrote:

Re: Tech News/User-notice - What wording would you suggest for the entry, and When should it be included (I assume this next edition)? Thanks!

Sorry for the delay. Suggested text (cc @mszabo)

The ORES extension adds an AbuseFilter variable if that extension is installed. This allows AbuseFilters to filter edits based on the RevertRisk score of the edit being attempted. The variable is only available on wikis where the RevertRisk LanguageAgnostic model is configured—see T392144 for a full list. It is only populated if the action being evaluated is an edit. For more information, please see https://www.mediawiki.org/wiki/Extension:ORES/AbuseFilter_variables#What_variables_are_available_for_use

Provide AbuseFilter condition for revertrisk thresholdOpen, In Progress, Needs TriagePublicActions