Page MenuHomePhabricator

Analytics-Data-ProblemTag
ActivePublic

Members (2)

Watchers (3)

Details

Description

Specific issues where an Analytics dataset has incorrect, missing, or malformed data or shows an anomaly which might be caused by such data. Not for general work on data quality processes or monitoring.

(Project tag requested in T362839.)

Recent Activity

Wed, Jul 9

mforns added a comment to T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots>.

Agree, that they look like bots.
We are working on an improvement to the bot detection pipeline currently.
Maybe after the update, these requests won't exist any more.
Nevertheless, it makes sense to normalize, since we are planning to backfill unique devices numbers soon.

Wed, Jul 9, 2:24 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights

Mon, Jun 30

Ahoelzl moved T395727: Sharp spike in unique devices for past month on all projects from Urgent to In progress on the Data-Engineering (Q4 2025 April 1st - June 30th) board.
Mon, Jun 30, 3:08 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats

Thu, Jun 26

JAllemandou added a comment to T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots>.

This was a known change when we migrated from Varnish to HAProxy. We decided not normalize the hosts, to keep the data as close as possible from the source.
My assumption is that hits where domains has a trailing dot are most probably bots. I'd be super happy to be proven wrong and asked to normalize though :)

Thu, Jun 26, 2:02 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights

Mon, Jun 23

mforns moved T395727: Sharp spike in unique devices for past month on all projects from Blocked/Paused to Urgent on the Data-Engineering (Q4 2025 April 1st - June 30th) board.
Mon, Jun 23, 3:28 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats
mforns moved T395727: Sharp spike in unique devices for past month on all projects from Urgent to Blocked/Paused on the Data-Engineering (Q4 2025 April 1st - June 30th) board.
Mon, Jun 23, 3:28 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats

Jun 16 2025

Ahoelzl assigned T395727: Sharp spike in unique devices for past month on all projects to mforns.
Jun 16 2025, 3:24 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats

Jun 11 2025

nshahquinn-wmf moved T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots> from Incoming to Watching on the Movement-Insights board.
Jun 11 2025, 6:41 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights

Jun 5 2025

Ahoelzl updated subscribers of T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots>.

@mforns please take a look

Jun 5 2025, 9:30 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights
Ahoelzl moved T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots> from Next Up to Urgent on the Data-Engineering (Q4 2025 April 1st - June 30th) board.
Jun 5 2025, 9:24 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights
Ahoelzl moved T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots> from Incoming (new tickets) to Q4 2025 April 1st - June 30th on the Data-Engineering board.
Jun 5 2025, 9:24 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights
VirginiaPoundstone updated subscribers of T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots>.

@BBlack and @KOfori any chance this is related to Varnish upgrade?

Jun 5 2025, 9:00 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights

Jun 4 2025

nettrom_WMF closed T396060: Rerun trust_safety_metrics queries for March 2025 as Resolved.

This has been completed by rerunning the Airflow DAGs for that snapshot. Thanks to the Data Engineering ops team!

Jun 4 2025, 8:56 PM · Product-Analytics (Kanban), Analytics-Data-Problem, Trust and Safety Product Team
nettrom_WMF updated the task description for T396060: Rerun trust_safety_metrics queries for March 2025.
Jun 4 2025, 8:53 PM · Product-Analytics (Kanban), Analytics-Data-Problem, Trust and Safety Product Team
Alien333 added a comment to T395727: Sharp spike in unique devices for past month on all projects.

(For some projects, 0 was reported, up to somewhere around June 2.)

Jun 4 2025, 7:15 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats
VirginiaPoundstone added a comment to T395727: Sharp spike in unique devices for past month on all projects.

Changes task name to "Sharp spike.."

Jun 4 2025, 6:14 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats
VirginiaPoundstone renamed T395727: Sharp spike in unique devices for past month on all projects from 0 unique devices for past month on all projects to Sharp spike in unique devices for past month on all projects.
Jun 4 2025, 6:13 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats
nettrom_WMF added a comment to T396060: Rerun trust_safety_metrics queries for March 2025.

Might be easier to do this through Airflow. Waiting for the DAG errors to go away first, though.

Jun 4 2025, 6:00 PM · Product-Analytics (Kanban), Analytics-Data-Problem, Trust and Safety Product Team
nettrom_WMF updated the task description for T396060: Rerun trust_safety_metrics queries for March 2025.
Jun 4 2025, 6:00 PM · Product-Analytics (Kanban), Analytics-Data-Problem, Trust and Safety Product Team
nettrom_WMF updated the task description for T396060: Rerun trust_safety_metrics queries for March 2025.
Jun 4 2025, 5:59 PM · Product-Analytics (Kanban), Analytics-Data-Problem, Trust and Safety Product Team
nettrom_WMF created T396060: Rerun trust_safety_metrics queries for March 2025.
Jun 4 2025, 5:47 PM · Product-Analytics (Kanban), Analytics-Data-Problem, Trust and Safety Product Team
Mayakp.wiki added a comment to T395727: Sharp spike in unique devices for past month on all projects.

I checked wikistats and I am not seeing zero unique devices for any of the projects. Yes they are inflated for May 2025 and we are investigating as mentioned above but could we please change the title and description of this task to reflect the issue correctly?

Jun 4 2025, 4:52 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats
nshahquinn-wmf added a project to T395727: Sharp spike in unique devices for past month on all projects: Analytics-Data-Problem.
Jun 4 2025, 4:48 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th), Movement-Insights, Analytics, Data-Engineering-Wikistats

Jun 3 2025

nshahquinn-wmf added a project to T395963: NEW BUG REPORT <"Domain" field issue: some domains have trailing dots>: Analytics-Data-Problem.
Jun 3 2025, 11:48 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Analytics-Data-Problem, Movement-Insights

May 28 2025

phuedx added a comment to T388825: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain.

<snip /> I think the behavior here is that when events are logged, meta.domain is set to whatever the hostname is <snip />

May 28 2025, 9:04 AM · Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), SUL3, Data-Engineering (Q4 2025 April 1st - June 30th), Event-Platform

May 27 2025

mpopov added a comment to T388825: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain.

I'll flag this to Experiment Platform, since we're maintainers of EventLogging and I think the behavior here is that when events are logged, meta.domain is set to whatever the hostname is, but we should discuss whether that is appropriate behavior for an instrument running on auth.wikimedia.org

May 27 2025, 9:31 PM · Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), SUL3, Data-Engineering (Q4 2025 April 1st - June 30th), Event-Platform
nshahquinn-wmf added a project to T388825: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain: Analytics-Data-Problem.
May 27 2025, 4:42 PM · Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), SUL3, Data-Engineering (Q4 2025 April 1st - June 30th), Event-Platform

May 23 2025

Gehel moved T388855: Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 from Incoming to Reported on the Discovery-Search (2025.05.02 - 2025.05.23) board.
May 23 2025, 12:13 PM · Discovery-Search (2025.05.02 - 2025.05.23), Data-Platform-SRE (2025.05.02 - 2025.05.23), Analytics-Data-Problem, serviceops
Gehel edited projects for T388855: Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1, added: Discovery-Search (2025.05.02 - 2025.05.23); removed Discovery-Search.
May 23 2025, 12:13 PM · Discovery-Search (2025.05.02 - 2025.05.23), Data-Platform-SRE (2025.05.02 - 2025.05.23), Analytics-Data-Problem, serviceops
Gehel moved T388855: Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 from Backlog - project to Reported on the Data-Platform-SRE (2025.05.02 - 2025.05.23) board.
May 23 2025, 9:00 AM · Discovery-Search (2025.05.02 - 2025.05.23), Data-Platform-SRE (2025.05.02 - 2025.05.23), Analytics-Data-Problem, serviceops
Gehel merged task T388855: Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 into T354853: Service mesh envoy does not treat incoming connections as local.
May 23 2025, 9:00 AM · Discovery-Search (2025.05.02 - 2025.05.23), Data-Platform-SRE (2025.05.02 - 2025.05.23), Analytics-Data-Problem, serviceops

May 12 2025

Ahoelzl closed T391708: Duplicate revisions and excess reverts in 2025-03 MediaWiki History snapshot as Resolved.
May 12 2025, 9:50 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th)

May 5 2025

tchin moved T391708: Duplicate revisions and excess reverts in 2025-03 MediaWiki History snapshot from In Review to Done on the Data-Engineering (Q4 2025 April 1st - June 30th) board.
May 5 2025, 3:13 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th)
Gehel edited projects for T388855: Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1, added: Data-Platform-SRE (2025.05.02 - 2025.05.23); removed Data-Platform-SRE (2025.04.12 - 2025.05.02).
May 5 2025, 12:50 PM · Discovery-Search (2025.05.02 - 2025.05.23), Data-Platform-SRE (2025.05.02 - 2025.05.23), Analytics-Data-Problem, serviceops

Apr 29 2025

nshahquinn-wmf added a project to T391708: Duplicate revisions and excess reverts in 2025-03 MediaWiki History snapshot: Analytics-Data-Problem.
Apr 29 2025, 6:07 PM · Analytics-Data-Problem, Data-Engineering (Q4 2025 April 1st - June 30th)
Ahoelzl moved T392624: Strange statistics for language variants for languages other than zh and sr from Incoming (new tickets) to Tag with Radar on the Data-Engineering board.
Apr 29 2025, 5:47 PM · Data-Engineering-Radar, Analytics-Data-Problem, Data-Engineering
Ahoelzl updated subscribers of T392624: Strange statistics for language variants for languages other than zh and sr.

@OSefu-WMF can you help assess the effort, impact and prioritize?

Apr 29 2025, 5:47 PM · Data-Engineering-Radar, Analytics-Data-Problem, Data-Engineering

Apr 28 2025

nshahquinn-wmf added a project to T392624: Strange statistics for language variants for languages other than zh and sr: Analytics-Data-Problem.
Apr 28 2025, 11:34 PM · Data-Engineering-Radar, Analytics-Data-Problem, Data-Engineering

Apr 19 2025

nshahquinn-wmf edited projects for T152546: de-duplicate archive records matching revision records in mediawiki_history, added: Analytics-Data-Problem; removed Analytics.
Apr 19 2025, 1:17 AM · Analytics-Data-Problem, Data-Engineering-Icebox, Data-Engineering
nshahquinn-wmf added a project to T259823: page_id is null where it shouldn't be in mediawiki history: Analytics-Data-Problem.
Apr 19 2025, 1:16 AM · Analytics-Data-Problem, Data-Engineering-Icebox, Data-Engineering

Apr 11 2025

Gehel edited projects for T388855: Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1, added: Data-Platform-SRE (2025.04.12 - 2025.05.02); removed Data-Platform-SRE (2025.03.22 - 2025.04.11).
Apr 11 2025, 1:16 PM · Discovery-Search (2025.05.02 - 2025.05.23), Data-Platform-SRE (2025.05.02 - 2025.05.23), Analytics-Data-Problem, serviceops

Apr 7 2025

nshahquinn-wmf added a project to T388855: Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1: Analytics-Data-Problem.
Apr 7 2025, 6:56 PM · Discovery-Search (2025.05.02 - 2025.05.23), Data-Platform-SRE (2025.05.02 - 2025.05.23), Analytics-Data-Problem, serviceops

Mar 13 2025

Ahoelzl added a comment to T364872: Unique devices per country spikes on wikifunctions .

@Mayakp.wiki weird, we did comprehensively backfill, including Druid.
Is there a way you can verify with the raw data?

Mar 13 2025, 5:52 PM · Data-Engineering, Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem

Mar 11 2025

Mayakp.wiki added a comment to T364872: Unique devices per country spikes on wikifunctions .

at first thought its not, because the October issue focussed on Wikipedia unique devices T373630#10177892. This one affected wikifunctions.
However considering that it affected similar countries the root cause could be the same, and the Jun, Jul, Aug, Sep spikes coincide with what we saw.
@Ahoelzl , was the backfill applied to Wikipedia unique devices only?
the data from past few months looks stable.

image.png (608×1 px, 183 KB)

Mar 11 2025, 10:33 PM · Data-Engineering, Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem
Ahoelzl moved T364872: Unique devices per country spikes on wikifunctions from Incoming (new tickets) to Needs Clarification on the Data-Engineering board.
Mar 11 2025, 9:15 PM · Data-Engineering, Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem
Ahoelzl updated subscribers of T364872: Unique devices per country spikes on wikifunctions .

@OSefu-WMF is this related to the unique devices issues we fixed in October?

Mar 11 2025, 9:15 PM · Data-Engineering, Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem

Mar 10 2025

nshahquinn-wmf added a project to T322545: wmf.virtualpageview_hourly's language_variant field is corrupted: Analytics-Data-Problem.
Mar 10 2025, 4:21 AM · Analytics-Data-Problem, Data-Engineering-Icebox, Data-Engineering, Data Pipelines

Mar 3 2025

Ahoelzl moved T383088: webrequest dataset sets referer_class "unknown" instead of "external (search engine)" for origin-based referer values from Incoming (new tickets) to Next Up on the Data-Engineering board.
Mar 3 2025, 9:31 PM · Analytics-Data-Problem, Movement-Insights, Product-Analytics, Data-Engineering

Feb 12 2025

nshahquinn-wmf moved T383088: webrequest dataset sets referer_class "unknown" instead of "external (search engine)" for origin-based referer values from Incoming to Watching on the Movement-Insights board.
Feb 12 2025, 7:41 PM · Analytics-Data-Problem, Movement-Insights, Product-Analytics, Data-Engineering

Feb 6 2025

daniel added a comment to T373874: Send Api-User-Agent header from MediaWiki client-side code.

Could we automatically set the gadget name in mw.Api by giving each gadget its own copy of mw? I'm not very familiar with the details of JS execution - could we just override getScript in GadgetResourceLoaderModule and wrap the entire thing in a scope that sets a local value for mw?

Feb 6 2025, 8:53 PM · User-notice, MW-1.44-notes (1.44.0-wmf.22; 2025-03-25), Patch-For-Review, Web-Team-Backlog-Archived, MW-Interfaces-Team, Design-System-Team, JavaScript

Feb 5 2025

daniel added a comment to T373874: Send Api-User-Agent header from MediaWiki client-side code.

... we could encode the user agent info in an Accept header, e.g. Accept: application/json;user-agent=my-gadget. That's a terrible hack, and would be bad for Turnilo as well, but it would work around the CORS issue.

Feb 5 2025, 6:41 AM · User-notice, MW-1.44-notes (1.44.0-wmf.22; 2025-03-25), Patch-For-Review, Web-Team-Backlog-Archived, MW-Interfaces-Team, Design-System-Team, JavaScript