Page MenuHomePhabricator

Deploy config change to start the Multi-Reference Check A/B Test
Closed, ResolvedPublic

Description

Deployment timing

Tuesday, March 25, 2025

Bucketing criteria

Bucketing Requirements:

  • Bucketing should include both registered and unregistered users at all identified participating wikis.
  • All users, who are editing a desktop or mobile main namespace page (NS:0), at any of the participating wikis should have a 50% chance of being included/bucketed into the A/B test's control or treatment group.
  • Bucketing should be done on a per-Wikipedia basis. 50% of people within a given wiki should be placed within the control group; 50% should be bucketed in the treatment group
  • The test group should have multi-check (references) experience enabled while the control group should only have the single Reference Check enabled (the current default experience at those wikis).
  • People should remain in the same test group for the duration of the test (and across sessions and pages).

Instrumentation-Related Requirements

  • A bucket is applied to these events so we can distinguish all events logged for the control group and the test group within the A/B test.
    • The bucket should be descriptive of the test in case there are every overlapping AB tests. For example, 2025-03-multicheck-reference-[test/control]
  • The anonymous_user_token field is populated for unregistered users in the test.

Participating wikis

WikiFeedback/NotesStatus
ar.wpPeople are being bucketed in correct groups,bucketing includes both registered and unregistered users, users are assigned the same test group for the duration of the test and across sessions and pages
afwikiPeople are being bucketed in correct groups,bucketing includes both registered and unregistered users, users are assigned the same test group for the duration of the test and across sessions and pages
eswikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users, users are assigned the same test group for the duration of the test and across sessions and pages
frwikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
igwikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
itwikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
jawikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
ptwikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
swwikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
yowikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
viwikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages
zhwikiPeople are being bucketed in correct groups, bucketing includes both registered and unregistered users,users are assigned the same test group for the duration of the test and across sessions and pages

Tech/News Draft

The Editing team will test a new way to display [[ mw:Help:Edit check#ref | References Check ]]. During this test, we will show several invitations to add a reference (called "Multi-check") instead of limiting ourselves to a single one, as is the case today. 50% of users will see this new configuration, while 50% will have the default configuration. The test will be conducted on the following wikis: ar.wp, afwiki, eswiki, frwiki, igwiki, itwiki, jawiki, ptwiki, swwiki, yowiki, viwiki,zhwiki

Edit Check Configuration

For the purposes of this A/B test, we're going to strive for the wikis that will participate to configure Edit Check in a consistent way. Doing so will enable us to draw wiki-agnostic conclusions about the impact(s) Edit Check causes.

Prompted by the question @MNeisler posed in T346837#9180760 and what we talked about offline on 20 Sep,

Done

  • A/B test is started
  • @MNeisler to verify ≥1 day after test start date that people are being bucketed as expected

Event Timeline

So far, no no-gos on the suggested list of wikis.

So far, no no-gos on the suggested list of wikis.

Beautiful. I'll share this list with product managers to ensure they don't see any conflicts as well.

MNeisler updated the task description. (Show Details)

@DLynch and @ppelberg I've updated the ticket with proposed bucketing requirements for review.

Trizek-WMF updated the task description. (Show Details)
Trizek-WMF updated the task description. (Show Details)
ppelberg moved this task from Inbox to Ready to Be Worked On on the Editing-team (Kanban Board) board.

@DLynch and @ppelberg I've updated the ticket with proposed bucketing requirements for review.

Excellent. Thank you, @MNeisler. I'm assigning this over to @DLynch to review and implement.

Updated planned deployment date (March 25, 2025), per today's offline discussion

Change #1127944 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Edit check: set up the multi-check a/b test

https://gerrit.wikimedia.org/r/1127944

Change #1127945 had a related patch set uploaded (by DLynch; author: DLynch):

[operations/mediawiki-config@master] Enable VisualEditor EditCheck multi-check a/b test

https://gerrit.wikimedia.org/r/1127945

Change #1127944 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Edit check: set up the multi-check a/b test

https://gerrit.wikimedia.org/r/1127944

Change #1128921 had a related patch set uploaded (by DLynch; author: DLynch):

[operations/mediawiki-config@master] Enable VisualEditor EditCheck multi-check a/b test on remaining wikis

https://gerrit.wikimedia.org/r/1128921

Change #1128922 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@wmf/1.44.0-wmf.20] Edit check: set up the multi-check a/b test

https://gerrit.wikimedia.org/r/1128922

Change #1127945 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable VisualEditor EditCheck multi-check a/b test on test2wiki

https://gerrit.wikimedia.org/r/1127945

Change #1128922 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@wmf/1.44.0-wmf.20] Edit check: set up the multi-check a/b test

https://gerrit.wikimedia.org/r/1128922

Mentioned in SAL (#wikimedia-operations) [2025-03-18T20:08:58Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1128922|Edit check: set up the multi-check a/b test (T384372)]], [[gerrit:1127945|Enable VisualEditor EditCheck multi-check a/b test on test2wiki (T384372)]], [[gerrit:1128777|Growth: enable new way of refreshing LinkRecommendations for pilots (T386250)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-18T20:16:07Z] <tgr@deploy2002> migr, kemayo, tgr: Backport for [[gerrit:1128922|Edit check: set up the multi-check a/b test (T384372)]], [[gerrit:1127945|Enable VisualEditor EditCheck multi-check a/b test on test2wiki (T384372)]], [[gerrit:1128777|Growth: enable new way of refreshing LinkRecommendations for pilots (T386250)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-18T20:25:34Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1128922|Edit check: set up the multi-check a/b test (T384372)]], [[gerrit:1127945|Enable VisualEditor EditCheck multi-check a/b test on test2wiki (T384372)]], [[gerrit:1128777|Growth: enable new way of refreshing LinkRecommendations for pilots (T386250)]] (duration: 16m 36s)

Next steps

  • @ppelberg to review state of work and verify tomorrow's (25 March) planned deployment can proceed
  • Once I do the above, @DLynch can schedule the deployment of the the config to start the test via backport

Next steps

  • @ppelberg to review state of work and verify tomorrow's (25 March) planned deployment can proceed
  • Once I do the above, @DLynch can schedule the deployment of the the config to start the test via backport

Per offline discussion, the A/B test is clear to start today, March 25, 2025.

All that's left to be done as it relates to the start of the test is @JFernandez-WMF updating T384954 with the findings from the last round of usability tests which does not need to block the start of the test.

Change #1128921 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable VisualEditor EditCheck multi-check a/b test on remaining wikis

https://gerrit.wikimedia.org/r/1128921

Mentioned in SAL (#wikimedia-operations) [2025-03-25T20:53:18Z] <kemayo@deploy1003> Started scap sync-world: Backport for [[gerrit:1128921|Enable VisualEditor EditCheck multi-check a/b test on remaining wikis (T384372)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-25T20:59:40Z] <kemayo@deploy1003> kemayo: Backport for [[gerrit:1128921|Enable VisualEditor EditCheck multi-check a/b test on remaining wikis (T384372)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-25T21:08:42Z] <kemayo@deploy1003> Finished scap sync-world: Backport for [[gerrit:1128921|Enable VisualEditor EditCheck multi-check a/b test on remaining wikis (T384372)]] (duration: 15m 23s)

I completed a QA of the multi-check ab test bucketing at partner wikis where deployed and confirmed that aggregate data appears as expected based on the bucketing requirements. See one open question and summary of checks below.

Open Question:
@DLynch - It looks like the init action does not consistently include the bucket data but subsequent actions such as ready do. I think I remember this issue with the single Reference Check AB test as well and just wanted to confirm this was expected. If so, I'll plan to review AB test events based on ready.

Summary of Checks

  • The number of sessions and users in the experiment (overall and by wiki) appear as expected based on 50/50 split.
test_group	                            n_sessions	 n_users
2025-03-editcheck-multicheck-reference-control	147664	80542
2025-03-editcheck-multicheck-reference-test	148055	80880
  • Each user is assigned to only one test group.
  • Bucketing includes both registered and unregistered users.
  • AB test assignments are recorded for both mobile and desktop editing sessions.
  • We are logging ab test assignments at all identified partner wikis.
  • The test group has more reference check shown events than the the control group as expected since the test group has the multi-check (references) experience and the control group has the single reference check experience.

cc @ppelberg

@MNeisler I think we don't expect the bucket in init events for wikitext.

Ryasmeen updated the task description. (Show Details)
Ryasmeen edited projects, added Verified; removed Editing QA.