Page MenuHomePhabricator

Temporary accounts: Automatically resolve temporary account names to IP addresses on displaying (auto reveal feature)
Open, Needs TriagePublic

Description

Note: Design specs being worked on in T374869

Motivation

In certain situations, temporary accounts can be hopping significantly faster than the communities are used to with the IP editors (in the worst scenario, on every edit). In instances of such high-speed account-hopping vandalism, patrollers would find it useful to temporarily enable an "auto-reveal mode" within which MediaWiki interface will automatically resolve temporary accounts into IP addresses where encountered. This enables them to quickly identify IPs that belong to the same range/area and determine the best way to block bad actors.
Having T358852: [Epic] Display temporary account contributions on Special:Contributions for IP addresses and IP ranges implemented together would also make it possible to query for (IP-wise) similar temporary accounts.

Access

Since this is more sensitive than one-by-one reveal, we need to limit this permission to sysops and above.

Design and Product Spec

See T374869: Design an auto-reveal mode for temporary account IPs and T385823: IP Auto-reveal: finalise remaining product specs

Related Objects

StatusSubtypeAssignedTask
Resolvedkostajh
DeclinedNone
In ProgressNiharika
ResolvedNiharika
OpenNone
ResolvedTchanders
ResolvedKColeman-WMF
ResolvedTchanders
ResolvedKColeman-WMF
ResolvedTchanders
ResolvedTchanders
ResolvedTchanders
ResolvedTchanders
ResolvedTchanders
ResolvedTchanders
Resolvedkostajh
ResolvedTchanders
OpenNone
StalledNone
ResolvedTchanders
ResolvedTchanders
OpenKColeman-WMF

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Adding @Niharika, who was looking into something similar in T346809: Bulk Reveal IP addresses. (I don't think that idea was exactly the same, since it refers to revealing IP addresses "in one step", rather than in zero steps which this task is suggesting.)

I'm wondering how logging will work.

What we currently log

Currently, we're logging when a user actually views IP addresses, and which temp user they view them for:

image.png (195×924 px, 85 KB)

This would be difficult to maintain if we do this task. If we did log which temp users' IP addresses were shown to users who see them automatically, it would (1) create a lot of logs, including about users that weren't actually being investigated, just because they appeared on a page; and (2) could violate the privacy of the stewards, whose page visits unrelated to patroller work could possibly be reconstructed from the log.

What the policy says

However, the Wikimedia Access to Temporary Account IP Addresses Policy states:

To ensure accountability, a log is kept of which users have access to temporary account IP addresses.

...which doesn't seem to imply that each instance needs to be logged: just when access was enabled/disabled for those users who need to accept a preference (checkuser-temporary-account) and when users become members, or stop being memebers, of groups who have IP reveal enabled automatically (checkuser-temporary-account-no-preference).

An idea that originates from a recent-ish meeting with @Tchanders is introducing a temporary "IP addresses visible" mode, which would ensure all temporary accounts are resolved to an IP, which can be enabled from time to time (for a specific reason). Conceptually, this would be similar to Phabricator's high-security mode, which removes the need to enter a MFA token every time a sensitive action is taken. If this mode exists, it could help with the logging. It would be more similar to T346809 (except it would still be multipage, just not unlimited over time).

Thanks @Urbanecm. We're discussing product work for this with @Niharika and @Madalina.

kostajh renamed this task from Temporary accounts: Automatically resolve temporary account names to IP addresses on displaying to Temporary accounts: Automatically resolve temporary account names to IP addresses on displaying (auto reveal feature).Aug 28 2024, 2:22 PM

From a DBA perspective, are there concerns about the volume of additions to the logging table for this feature?

The feature will add an entry to the logging table whenever a temporary account is seen in various user interfaces (recent changes; watchlist; history pages; etc) when a privileged user has a preference enabled. So e.g. if a user with the ability to auto-reveal IP addresses for temporary accounts is on RecentChanges set to show the last 500 changes, and all 500 changes are made by different temporary accounts, then we would log 500 rows to the logging table indicating that the privileged user revealed those IPs.

Some additional notes:

  • we'd debounce logging the "reveal" if the temporary account's IP was already revealed in the last 24 hours
  • The log entry is unique per privileged user and temporary account, so e.g. if users A, B and C have the right to auto reveal temp accounts, and each of those users visit a page with "temporary account ~2024-1", then there will be three log entries.

So, the theoretical maximum would be # of temporary accounts that haven't been logged in last 24 hours x # of users with ability to auto reveal accounts, which will be a large number on bigger wikis. In practice, my guess is that we might see something in the neighborhood of thousands / tens of thousands of rows added each day, but it's hard to know without seeing what the feature usage looks like. (We could also feature flag this so that if the volume of log entries is overwhelming, we can quickly shut it off.)

Does the above concern DBAs from the point of view of adding too many rows to the logging table?

For the terms of impact on the databases, I think having a realistic estimate would be useful. I think we can deploy the feature and then decide what to do if our measurements turn out to be too big. For example, one possible solution would be to drop any logs after X years or something like that.

OTOH, from auditing point of view, adding a lot of IP reveals defies the point of logging in the first place. But this is a more product issue.

For the terms of impact on the databases, I think having a realistic estimate would be useful. I think we can deploy the feature and then decide what to do if our measurements turn out to be too big. For example, one possible solution would be to drop any logs after X years or something like that.

I'm not sure it's possible to have a realistic estimate until we see how users make use of this feature. We will definitely know more as part of the pilot wiki deployment stages.

OTOH, from auditing point of view, adding a lot of IP reveals defies the point of logging in the first place. But this is a more product issue.

Yeah, this is a tough one. I suspect for auditing, you'd probably want to query an aggregate of all the individual reveals. It's kind of difficult to foresee what misuse of this feature might look like as well. But single log entry per reveal seems safe in terms of preserving the ability to audit whatever pattern of abuse/misuse we might identify later on.

Some thoughts on how to approach the implementation.

How to model auto-reveal mode?

  • The user needs to have the mode with an expiry.
    • We could use a user group, since they have expiry times.

How to log entering and leaving auto-reveal mode

  • We can log when the user joins the user group and include the expiry in the log.
  • We can log when the expiry is extended or shortened.
  • Logging when the expiry time elapses:
    • Is it actually possible to make a log entry at the moment when the expiry time is reached?
    • Or would we log on purging the expired row?
    • Do we even need to log when the group membership is removed, if we are already logging when the group is joined and for how long, and when the expiry time is changed?

How to display IPs instead of temp accounts automatically

  • How does normal IP reveal work?
    • The IP reveal buttons are currently added by JS once the HTML is available.
    • They are added to any user link with the class mw-tempuserlink.
    • Clicking on a button does an API lookup to the CheckUser tables.
    • The reveal IP button is replaced by the found IP.
  • Building on this to do auto-reveal:
    • We could have a JS script that looks for the buttons with mw-tempuserlink and does the API call instead of adding a button.
    • If a user is in auto-reveal mode, we would load this script instead of the one that adds IP reveal buttons.
  • Doing auto-reveal on the server instead:
    • We should keep the scope the same for auto-reveal as for normal IP reveal, to avoid confusion (i.e they both work for links with mw-tempuserlink).
    • Could we for example add an IP address next to the temporary user name using a hook that CheckUser handles, from the Linker?
    • Would we still persist revealed IPs for 24 hours (or CheckUserTemporaryAccountMaxAge), and could we still use localStorage for this?

@Niharika Have we written down anywhere who would get the auto-reveal permission?

@Niharika Have we written down anywhere who would get the auto-reveal permission?

It's in the task description:

Access
Since this is more sensitive than one-by-one reveal, we need to limit this permission to sysops and above.

Niharika changed the task status from Stalled to Open.Feb 12 2025, 9:38 AM
Niharika updated the task description. (Show Details)

@Tchanders and I were discussing the current design and we think it could be useful to allow users to skip setting a duration in the dialog, in order to enable the feature with one click from Tools menu.

In current design:

  1. User clicks tools menu link
  2. Dialog displays
  3. User selects duration (30 minutes or 1 hour)
  4. User clicks button to enable
  5. Feature enables with bottom-right panel open.

In alternative design:

  1. User clicks tools menu link
  2. Dialog displays
  3. User clicks button to enable and can select "don't show again" checkbox
  4. Feature enables with bottom-right panel open.

If user selects "don't show again" checkbox then the next time they turn on the feature the flow is:

  1. User clicks tools menu link
  2. Feature enables with bottom-right panel open.

If we go with the alternative design, the trade-off is we would eliminate having duration options of 30 minutes and 1 hour, and simply set duration to 1 hour. We have heard from some Stewards that this would be preferable. I'm interested to hear from anyone who would rather keep the select menu / 30 minutes as an option and why that is. If not, I propose we try the alternative approach (assuming it's ok with Legal etc). cc @Niharika

Alternative flow.png (3×8 px, 1 MB)

Removing this from the major pilots board, since the work to be done before major pilots release is done. Still keeping the temporary accounts tag.