Page MenuHomePhabricator

Instrument the multi-check experience
Closed, ResolvedPublic

Description

In T324735, we implemented the initial instrumentation needed to evaluate the impact of the first Reference Check in the analysis T342930 describes.

This task involves the work of building upon this initial "body" of instrumentation to enable us to evaluate the impact of the multi-check experience T347530 will introduce.

Requirements

Borrowed from what @MNeisler documented in T352092#10545377.

featureaction
editCheck-[checkname]action-accept
editCheck-[checkname]action-reject
editCheck-[checkname]action-[any other custom choice we later implement]
editCheck-[checkname]check-shown-[moment] [1]
editCheckDialogwindow-open-from-check-[moment] [1]
[fixed/sidebar/gutterSidebar]EditCheckDialogwindow-open-from-check [3]

See full instrumentation Spec.

Done

  • @DLynch implements new instrumentation
  • Editing QA verifies new events are being emitted by clients on test2wiki
  • @MNeisler verifies new events are being logged in DB

[1] I'm guessing on how these two new events (a pre-save show event and sidebar event to distinguish listeners) would be implemented but not tied to proposed names. Feel free to adjust as needed. The main requirements are that I can distinguish these events by edit check type and an event is sent per-check if possible. My logic behind doubling up on the events is that it'll probably be much easier to query for totals + breakdowns that way.

[2] Thinking this through some more, It would not be too difficult to query for edit check totals using something like event.feature LIKE 'editCheck% if we want to remove that redundancy in events.

[3] We have multiple types of sidebar window now, and this sort of correlates to the "moments" but also doesn't entirely.

Related Objects

Event Timeline

MNeisler triaged this task as Medium priority.
MNeisler added a project: Product-Analytics.
MNeisler moved this task from Triage to Upcoming Quarter on the Product-Analytics board.

Per what the Editing Team discussed offline during today's planning meeting, at a minimum, we'll need instrumentation to help us answer questions like:
1. How many checks were presented/activated during any given edit session?
2. Of this checks that were activated within a given edit session, to what extent – if any – did the person engage with each check?

As part of the measurement plan for assessing the impact of introducing multiple checks, we identified metrics based on being able to calculate the number of checks presented within a single editing session such as edit completion rate broken down by the number of checks shown. The average number of edit checks shown per session is also currently identified as one of the guardrail metrics to monitor impacts from changes to edit check.

Based on discussions with @DLynch, this is not currently instrumented. VisualEditorFeatureUse will currently only log an edit check event once in an editing session rather than firing three times if there are three reference checks.

Adding instrumentation that logs an event for all checks presented to the user requires further discussion and investigation. Current multi-check design includes a lot of regeneration-and-redisplay of checks happening during the mid-edit stage. Pre-save checks are simpler.

Note: This instrumentation would be needed in addition to the approach for revision tags determined in T352120 to track edit checks shown within an edit attempt vs a published edit.

cc @ppelberg

Suggestion is to add semi-generic instrumentation to add to VisualEditorFeatureUse around users acting on a check. (As opposed to being shown a check, which thus avoids questions around how to handle the in-flux nature of the mid-edit checks.)

featureaction
editCheckaction-accept
editCheckaction-reject
editCheckaction-[any other custom choice we later implement]
editCheck-[checkname]action-accept
editCheck-[checkname]action-reject
editCheck-[checkname]action-[any other custom choice we later implement]
  • Some of this already exists specific to the AddReference check -- there's editCheckReferences; edit-check-[confirm / reject] events. This would mostly be getting rid of the need for custom setup of these.
  • My logic behind doubling up on the events is that it'll probably be much easier to query for totals + breakdowns that way.
  • Distinguishing between mid-edit and pre-save checks can be done because pre-save checks are between an EditAttemptStep saveIntent and a editCheckDialog; dialog-abort. (I might also want to look into making that close event more specific to whether you proceeded or backed out...)

What this won't get us is a count of how many checks were shown to someone. We discussed in the meeting that getting a meaningful count for mid-edit checks is going to be complicated, but for pre-save we could log e.g. a VEFU editCheck; pre-save-shown-17 sort of event, though that'd be a pain to built reports around. (But we could bucket it -- 1, 2-5, 6-10, 11+, etc.)

Next steps

Per today's offline discussion:

  • 1. David: articulate rough proposal for generic logging of Check actions
  • 2. Megan to become clear about ideal instrumentation
  • 3. Megan evaluate viability of approach David described in T352092#10510483
  • 4. David, Megan, Peter: meet to discuss evaluation ("3.")

We decided in a meeting that the generic suggestion is mostly okay. However, we'll also need:

  • A pre-save show event per-check
  • Something added to the sidebar show event to distinguish the listeners, so we can tell the difference between mid-edit and pre-save checks

Confirming that the instrumentation proposed in T352092#10510483 in addition to the new pre-save show and edit check moment events, will meet requirements to measure the impact of Reference Check (to be done in T379131) .

We'll need to revisit instrumentation requirements for additional check types (i.e paste and peacock checks) once those workflows have been finalized.

New instrumentation requirements are summarized below for reference:

featureaction
editCheck-[checkname]action-accept
editCheck-[checkname]action-reject
editCheck-[checkname]action-[any other custom choice we later implement]
editCheck-[checkname]check-shown-presave [1]
editCheck-[checkname]window-open-from-[moment]check] [1]

A couple notes/questions:
[1] I'm guessing on how these two new events (a pre-save show event and sidebar event to distinguish listeners) would be implemented but not tied to proposed names. Feel free to adjust as needed. The main requirements are that I can distinguish these events by edit check type and an event is sent per-check if possible.

My logic behind doubling up on the events is that it'll probably be much easier to query for totals + breakdowns that way.

[2] Thinking this through some more, It would not be too difficult to query for edit check totals using something like event.feature LIKE 'editCheck% if we want to remove that redundancy in events.

@DLynch - Assigning this task to you for final review and implementation but let me know if you have any questions or need any additional info.

Instrumentation Spec

Change #1124850 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Edit check: add instrumentation around the check lifecycle

https://gerrit.wikimedia.org/r/1124850

Change #1124850 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Edit check: add instrumentation around the check lifecycle

https://gerrit.wikimedia.org/r/1124850

ppelberg added a project: Editing QA.
ppelberg updated the task description. (Show Details)
ppelberg moved this task from Inbox to High Priority on the Editing QA board.
ppelberg moved this task from Code Review to QA on the Editing-team (Kanban Board) board.

Megan and I had talked off-ticket about some changes to the exact events, and I forgot to update the ticket with them despite having implemented them as-discussed.

Completed testing all the events. Everything looks good on my end, the only issue found was the action: window-open-from-[moment]check seems to be associated with feature:editCheck as opposed to feature:editCheck-[checkname] as specified on the task which turned out to be expected and now updated on the table in the task description.

Screenshot 2025-03-18 at 4.53.05 PM.png (1×3 px, 574 KB)

Screenshot 2025-03-18 at 4.42.57 PM.png (968×3 px, 392 KB)

{F58864157}

MNeisler added a subscriber: Ryasmeen.

Assigning to myself to verify that the new events are logged in VisualEditorFeature as expected.

I've verified that the new events added to track multi-check engagement are being stored as expected in VisualEditorFeatureUse. This based on a review of events generated on test2wiki between 18 March and 20 March.

See summary of the checks completed and findings below:

  • There were 6 total editing sessions where multi-check events were logged. 4 control group sessions and 2 test group sessions. Note: Once this is deployed to partner wikis and we have a larger sample of events, I'll QA the bucketing to confirm that we are seeing close to a 50/50 split across the test groups.
  • Confirmed that the bucket field is populated as expected with either 2025-03-editcheck-multicheck-reference-control or 2025-03-editcheck-multicheck-reference-test depending on user assignment.
  • Events are sent for both logged-in and logged-out users.
  • For logged-out users, we are populating the anonymous_user_token field as expected.
  • Confirmed that all new feature and action values implemented for multi-check (as documented in the task description) are stored correctly.
    • editCheck-addReference is the only specific check type feature being logged at the moment as expected.
    • Only pre-save moment actions (check-shown-presave, window-open-from-check-presave) are stored (also expected)
  • Events stored for both platform = desktop and platform = phone.
  • Events only occur where editor_interface = visualeditor
  • Confirmed based on the number of check-shown-presave events that the test group receives multiple checks during the pre-save moment while the control only receives one.
  • Confirmed data joins correctly with EditAttemptStep to be able to effectively calculate edit completion rate and other metrics.

Note: I've updated the VEFU data dictionary so these new multi-check events are now reflected there.

@Ryasmeen + @MNeisler + @DLynch: this all looks wonderful. Nicely done 👏🏼