Page MenuHomePhabricator

Set up x3 replication to wikireplicas
Closed, ResolvedPublic

Description

x3 also needs to have replication to wikireplicas.

Steps:

  • Set up a new record for x3 and point it to s8
  • Announce the change and ask users to update their code to use x3
  • Do the rest of x3 set up and cut the replication. At this point, the data starts to drift.
  • Set up a mariadb daemon in sanitarium and wiki replicas
  • Update cloudlb config to point traffic to the new MariaDB instances

Details

SubjectRepoBranchLines +/-
operations/alertsmaster+5 -5
operations/puppetproduction+4 -2
operations/puppetproduction+4 -2
operations/puppetproduction+2 -1
operations/puppetproduction+2 -4
operations/puppetproduction+4 -2
operations/puppetproduction+15 -3
operations/puppetproduction+9 -9
operations/puppetproduction+21 -10
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+2 -1
operations/puppetproduction+2 -2
operations/puppetproduction+1 -0
operations/puppetproduction+7 -3
operations/puppetproduction+3 -2
operations/homer/publicmaster+2 -1
operations/puppetproduction+8 -6
operations/puppetproduction+2 -2
operations/puppetproduction+23 -21
operations/puppetproduction+6 -0
operations/puppetproduction+2 -0
operations/puppetproduction+4 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

For Tech News: Please could someone suggest how to phrase an entry? My best-guess would be to re-use parts of the summary from the page Taavi made. -- But I'm not sure if that is enough info, or if affected developers will want to know other details upfront (to determine if/when they need to pay attention)? Or maybe we can drop the 3rd sentence (redundant? or clarifying?). Edits welcome.

  • Developers who maintain a tool that queries the Wikidata term store tables (wbt_*) need to update their code to connect to a separate database cluster. These tables are being split into a separate database cluster. Tools that query those tables via the wiki replicas must be adapted to connect to the new cluster instead. Documentation and related links are available.

Change #1148310 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] hieradata: cloudlb: Start announcing x3 VIPs

https://gerrit.wikimedia.org/r/1148310

Change #1148311 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] cloudlb: Support multiple wiki replica addresses per section

https://gerrit.wikimedia.org/r/1148311

Change #1148312 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] hieradata: cloudlb: Listen on s8 on the x3 VIP

https://gerrit.wikimedia.org/r/1148312

Change #1148313 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] hieradata: Update wiki replicas x3 DNS records to new VIP

https://gerrit.wikimedia.org/r/1148313

Change #1148310 merged by Majavah:

[operations/puppet@production] hieradata: cloudlb: Start announcing x3 VIPs

https://gerrit.wikimedia.org/r/1148310

Change #1148311 merged by Majavah:

[operations/puppet@production] cloudlb: Support multiple wiki replica addresses per section

https://gerrit.wikimedia.org/r/1148311

Change #1148312 merged by Majavah:

[operations/puppet@production] hieradata: cloudlb: Listen on s8 on the x3 VIP

https://gerrit.wikimedia.org/r/1148312

Change #1148313 merged by Majavah:

[operations/puppet@production] openstack: wikireplica_dns: Point x3 records to new VIP

https://gerrit.wikimedia.org/r/1148313

For what is worth, x3 should go to clouddb1020 and clouddb1016 (s5 and s8) as they have plenty of disk space available - also easier as we don't have to transfer the data, just cp the s8 directory within the same host

We shouldn't let s8 and x3 to run with each others tables for long. We should try to drop the not needed ones relatively soonish after setting them up, to avoid disk space issues on sanitarium host (db1154)

Change #1149603 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] conftool-data: Add x3 wiki replica backend services

https://gerrit.wikimedia.org/r/1149603

Change #1149604 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:wmcs::cloudlb: Add x3 wiki replica backend service

https://gerrit.wikimedia.org/r/1149604

Change #1149605 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] hieradata: cloudlb: Move x3 VIP to new x3 backend

https://gerrit.wikimedia.org/r/1149605

Change #1149606 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/homer/public@master] definitions: Add port for x3 wiki replica backend

https://gerrit.wikimedia.org/r/1149606

Change #1149606 merged by jenkins-bot:

[operations/homer/public@master] definitions: Add port for x3 wiki replica backend

https://gerrit.wikimedia.org/r/1149606

I've dropped the non-term store tables from db1211 and db2243. I think db1211 can become the sanitarium master.

I've dropped the non-term store tables from db1211 and db2243. I think db1211 can become the sanitarium master.

That was supposed to be candidate master, but no worries, I will just pick another one. They are all running SBR

Change #1152688 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1211: Make it sanitarium master for x3

https://gerrit.wikimedia.org/r/1152688

Change #1152688 merged by Marostegui:

[operations/puppet@production] db1211: Make it sanitarium master for x3

https://gerrit.wikimedia.org/r/1152688

I've dropped the non-term store tables from db1211 and db2243. I think db1211 can become the sanitarium master.

That was supposed to be candidate master, but no worries, I will just pick another one. They are all running SBR

oh sorry 😅

Change #1152700 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] site.pp: Add db1211 to sanitarium master role

https://gerrit.wikimedia.org/r/1152700

Change #1152700 merged by Marostegui:

[operations/puppet@production] site.pp: Add db1211 to sanitarium master role

https://gerrit.wikimedia.org/r/1152700

Change #1152743 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1154: Add x3

https://gerrit.wikimedia.org/r/1152743

Change #1152743 merged by Marostegui:

[operations/puppet@production] db1154: Add x3

https://gerrit.wikimedia.org/r/1152743

Change #1152760 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] check_private_data_report: Add x3

https://gerrit.wikimedia.org/r/1152760

Sanitarium host has been cloned

Change #1152760 merged by Marostegui:

[operations/puppet@production] check_private_data_report: Add x3

https://gerrit.wikimedia.org/r/1152760

Change #1153018 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] clouddb1016: Add x3

https://gerrit.wikimedia.org/r/1153018

Change #1153018 merged by Marostegui:

[operations/puppet@production] clouddb1016: Add x3

https://gerrit.wikimedia.org/r/1153018

Mentioned in SAL (#wikimedia-operations) [2025-06-03T06:29:35Z] <marostegui@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954

Mentioned in SAL (#wikimedia-operations) [2025-06-03T06:30:27Z] <marostegui@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Setting up x3 T390954

Mentioned in SAL (#wikimedia-operations) [2025-06-03T06:37:18Z] <marostegui> Decrease buffer size on clouddb1016:s8 T390954

@taavi for now I've set up clouddb1016:3363 - I'd need someome from cloud-services-team to deploy views, add it to the LB etc, to make sure it is working fine and we are not missing any step. If that's fine, then I can add the instance to a second host.

The data is clean

root@clouddb1016:~# check_private_data.py -S /run/mysqld/mysqld.x3.sock
-- Non-public databases that are present:
-- Non-public tables that are present:
-- Unfiltered columns that are present:

Thank you! I will have a look at that.

Change #1153123 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] wikireplicas: maintain-views: Support running on x3

https://gerrit.wikimedia.org/r/1153123

Change #1153124 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] wikireplicas: maintain-views: Allow running on a specific section only

https://gerrit.wikimedia.org/r/1153124

Change #1153125 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] wikireplicas: maintain-views: Log which section is being acted on

https://gerrit.wikimedia.org/r/1153125

Change #1149603 merged by Majavah:

[operations/puppet@production] conftool-data: Add x3 wiki replica backend services

https://gerrit.wikimedia.org/r/1149603

Change #1149604 merged by Majavah:

[operations/puppet@production] P:wmcs::cloudlb: Add x3 wiki replica backend service

https://gerrit.wikimedia.org/r/1149604

maintain-views needed adjusting as extension sections don't have dblists with their list of databases, patches for that are above. With the updated script manually applied on clouddb1016, it created the views as expected.

I added the load balancer backend definition to check that is working as expected, and HAProxy sees the backend as healthy. Traffic will not move to the new servers until https://gerrit.wikimedia.org/r/c/operations/puppet/+/1149605 is applied.

maintain-dbusers appears to be working as expected with no additional setup required.

I added the load balancer backend definition to check that is working as expected, and HAProxy sees the backend as healthy. Traffic will not move to the new servers until https://gerrit.wikimedia.org/r/c/operations/puppet/+/1149605 is applied.

Do you want me to setup the other server then?

Do you want me to setup the other server then?

Yes please!

Change #1153123 merged by Majavah:

[operations/puppet@production] wikireplicas: maintain-views: Support running on x3

https://gerrit.wikimedia.org/r/1153123

Change #1153124 merged by Majavah:

[operations/puppet@production] wikireplicas: maintain-views: Allow running on a specific section only

https://gerrit.wikimedia.org/r/1153124

Change #1153125 merged by Majavah:

[operations/puppet@production] wikireplicas: maintain-views: Log which section is being acted on

https://gerrit.wikimedia.org/r/1153125

Change #1149605 merged by Majavah:

[operations/puppet@production] hieradata: cloudlb: Move x3 VIP to new x3 backend

https://gerrit.wikimedia.org/r/1149605

Mentioned in SAL (#wikimedia-operations) [2025-06-03T13:02:45Z] <marostegui@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954

Mentioned in SAL (#wikimedia-operations) [2025-06-03T13:04:14Z] <marostegui> Shutdown clouddb1016:x3 T390954

Change #1153144 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] Revert "hieradata: cloudlb: Move x3 VIP to new x3 backend"

https://gerrit.wikimedia.org/r/1153144

Change #1153144 merged by Majavah:

[operations/puppet@production] Revert "hieradata: cloudlb: Move x3 VIP to new x3 backend"

https://gerrit.wikimedia.org/r/1153144

Change #1153145 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] clouddb1020.yaml: Add x3

https://gerrit.wikimedia.org/r/1153145

Change #1153146 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] Reapply "hieradata: cloudlb: Move x3 VIP to new x3 backend"

https://gerrit.wikimedia.org/r/1153146

Change #1153145 merged by Marostegui:

[operations/puppet@production] clouddb1020.yaml: Add x3

https://gerrit.wikimedia.org/r/1153145

@BTullis I assume an-redacttedb host also needs x3? (a set of tables split from s8)

@Marostegui - Thanks. I'll double-check with the Data-Engineering team, but my current understanding is that we do not need a copy of this new x3 section on an-redacteddb1001.
There is a related ticket about it here: T391006: Remove sqoop code for wikibase term storage and a Slack thread here.

@Marostegui - Thanks. I'll double-check with the Data-Engineering team, but my current understanding is that we do not need a copy of this new x3 section on an-redacteddb1001.
There is a related ticket about it here: T391006: Remove sqoop code for wikibase term storage and a Slack thread here.

I see! If you can double confirm - that would be appreciated.

Change #1153146 merged by Majavah:

[operations/puppet@production] Reapply "hieradata: cloudlb: Move x3 VIP to new x3 backend"

https://gerrit.wikimedia.org/r/1153146

Change #1153564 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] Reapply "hieradata: cloudlb: Move x3 VIP to new x3 backend"

https://gerrit.wikimedia.org/r/1153564

@Marostegui - Thanks. I'll double-check with the Data-Engineering team, but my current understanding is that we do not need a copy of this new x3 section on an-redacteddb1001.
There is a related ticket about it here: T391006: Remove sqoop code for wikibase term storage and a Slack thread here.

I see! If you can double confirm - that would be appreciated.

Confirmed. We do not need this new section on an-redacteddb1001. Thanks for checking with us.

Change #1153564 merged by Majavah:

[operations/puppet@production] Reapply "hieradata: cloudlb: Move x3 VIP to new x3 backend"

https://gerrit.wikimedia.org/r/1153564

@Marostegui - Thanks. I'll double-check with the Data-Engineering team, but my current understanding is that we do not need a copy of this new x3 section on an-redacteddb1001.
There is a related ticket about it here: T391006: Remove sqoop code for wikibase term storage and a Slack thread here.

I see! If you can double confirm - that would be appreciated.

Confirmed. We do not need this new section on an-redacteddb1001. Thanks for checking with us.

Thank you!

So I think this is all done then. Replication is running on clouddb1016 and 1020.

Change #1154163 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/alerts@master] team-wmcs: Adapt HAProxy alerts for x3 on the replicas

https://gerrit.wikimedia.org/r/1154163

Change #1154163 merged by jenkins-bot:

[operations/alerts@master] team-wmcs: Adapt HAProxy alerts for x3 on the replicas

https://gerrit.wikimedia.org/r/1154163