User Details
- User Since
- Jun 29 2021, 9:56 AM (211 w, 2 d)
- Availability
- Available
- IRC Nick
- btullis
- LDAP User
- Btullis
- MediaWiki User
- BTullis (WMF) [ Global Accounts ]
Today
I have applied another patch that updates some of the Java options around GC.
There is one very clear front-runner in terms of helm charts for the opensearch-operator, which is the official version.
https://github.com/opensearch-project/opensearch-k8s-operator/tree/main/charts/opensearch-operator
I'll start the work on this now. It might be nice to disable the systemd timers that run the sql/xml dumps on the snapshot shosts before they start a dump run on July 20th.
@Stevemunene - You'll see that I provisioned these three VMs for you, to save a bit of time. I was able to use the --storage_type plain option when creating them, so they are ready to go.
I think that you can follow these guidelines to make the cluster itself. https://wikitech.wikimedia.org/wiki/Etcd and with reference to https://github.com/wikimedia/operations-puppet/blob/production/hieradata/role/common/etcd/v3/dse_k8s_etcd.yaml
We are down to 54 million under-replicated files.
One thing to bear in mind is that sqoop was removed from BigTop recently: https://issues.apache.org/jira/browse/BIGTOP-3770
So we may need to keep using our bigtop 1.5 version, or find another alternative.
This is now done.
I have created a new GitLab repository here: https://gitlab.wikimedia.org/repos/data-engineering/bigtop-build
Yesterday
Notice: /Stage[main]/Bigtop::Hive/File[/etc/hive/conf.analytics-test-hadoop/hive-env.sh]/content: --- /etc/hive/conf.analytics-test-hadoop/hive-env.sh 2023-08-10 12:02:47.190163979 +0000 +++ /tmp/puppet-file20250716-405022-11hcgyi 2025-07-16 13:41:11.608993039 +0000 @@ -8,7 +8,7 @@ export HIVE_SKIP_SPARK_ASSEMBLY=true
We can see some interesting performance characteristics here
including this:
I thought that I would look at the performance and caching optimizations in different patches, since caching could change the behaviour.
I have prepared the two disks as per the instructions at: https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Hadoop/Administration#Swapping_broken_disk
btullis@an-worker1189:~$ sudo parted /dev/sdn --script mklabel gpt btullis@an-worker1189:~$ sudo parted /dev/sdn --script mkpart primary ext4 0% 100% btullis@an-worker1189:~$ sudo mkfs.ext4 -L hadoop-n /dev/sdn1 mke2fs 1.46.2 (28-Feb-2021) Creating filesystem with 1953365504 4k blocks and 244170752 inodes Filesystem UUID: 737d7e51-1f10-4257-babf-2126028c5387 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632
Thanks @Jclark-ctr and apologies for the delay.
We're OK with deleting the preserved cache on these an-worker data drives, because they are all individual raid0 volumes. Data loss is therefore unavoidable when these drives fail, but recovery is managed by Hadoop itself.
Tue, Jul 15
This is now done, so all of the spark images are now based on golang1.19 and have been rebuilt and published.
root@build2001:/srv/images/production-images# /srv/deployment/docker-pkg/venv/bin/docker-pkg -c /etc/production-images/config.yaml build images/ --select '*spark*' == Step 0: scanning /srv/images/production-images/images/ == Will build the following images: * docker-registry.discovery.wmnet/spark3.4-build:3.4.1-5 * docker-registry.discovery.wmnet/spark3.1-build:3.1.2-4 * docker-registry.discovery.wmnet/spark3.1:3.1.2-5 * docker-registry.discovery.wmnet/spark3.1-operator:1.3.7-3.1.2-4 * docker-registry.discovery.wmnet/spark3.3-build:3.3.2-4 * docker-registry.discovery.wmnet/spark3.3:3.3.2-5 * docker-registry.discovery.wmnet/spark3.3-operator:1.3.8-3.3.2-4 * docker-registry.discovery.wmnet/spark3.4:3.4.1-5 * docker-registry.discovery.wmnet/spark3.4-operator:1.3.8-3.4.1-5 == Step 1: building images == * Built image docker-registry.discovery.wmnet/spark3.4-build:3.4.1-5 * Built image docker-registry.discovery.wmnet/spark3.1-build:3.1.2-4 * Built image docker-registry.discovery.wmnet/spark3.1:3.1.2-5 * Built image docker-registry.discovery.wmnet/spark3.1-operator:1.3.7-3.1.2-4 * Built image docker-registry.discovery.wmnet/spark3.3-build:3.3.2-4 * Built image docker-registry.discovery.wmnet/spark3.3:3.3.2-5 * Built image docker-registry.discovery.wmnet/spark3.3-operator:1.3.8-3.3.2-4 * Built image docker-registry.discovery.wmnet/spark3.4:3.4.1-5 * Built image docker-registry.discovery.wmnet/spark3.4-operator:1.3.8-3.4.1-5 == Step 2: publishing == Successfully published image docker-registry.discovery.wmnet/spark3.4-build:3.4.1-5 Successfully published image docker-registry.discovery.wmnet/spark3.1-operator:1.3.7-3.1.2-4 Successfully published image docker-registry.discovery.wmnet/spark3.1:3.1.2-5 Successfully published image docker-registry.discovery.wmnet/spark3.1-build:3.1.2-4 Successfully published image docker-registry.discovery.wmnet/spark3.3-operator:1.3.8-3.3.2-4 Successfully published image docker-registry.discovery.wmnet/spark3.3:3.3.2-5 Successfully published image docker-registry.discovery.wmnet/spark3.4-operator:1.3.8-3.4.1-5 Successfully published image docker-registry.discovery.wmnet/spark3.3-build:3.3.2-4 Successfully published image docker-registry.discovery.wmnet/spark3.4:3.4.1-5 == Build done! == You can see the logs at ./docker-pkg-build.log
Mon, Jul 14
I am checking that they build without the workaround by using this command.
root@build2001:/srv/images/production-images# /srv/deployment/docker-pkg/venv/bin/docker-pkg -c /etc/production-images/config.yaml build images/ --select '*spark*'
I checked and according to this: https://metadata.ftp-master.debian.org/changelogs//main/o/openjdk-8/openjdk-8_8u452-ga-1_changelog
...the fix mentioned was included in version: 8u402-ga-3
These patches are all deployed, so I think that we can resolve this ticket now.
I checked that https://yarn.wikimedia.org/spark-history (without the trailing slash) now works.
It has dropped from 82 million to 65 million in about 90 hours.
That's not too bad.
Fri, Jul 11
There is an issue at present with reimaging these cloudcephosd nodes back to bullseye.
The problem arises because of this remove_os_md() function that is excuted.
Hopefully these patches should address the first three asks.
- Increased timeout to the proxy backend (300 seconds)
- Double the CPU limit (with 50% more RAM, too)
- Fixed the trailing slash issue.
Thu, Jul 10
That graph levelled off at 82.6 million and has now started to drop slowly.
I silenced the hadoop-yarn-nodemanager services that were failing on the affected hosts.
We are seeing the under-replicated blocks climbing, as we expect. https://grafana.wikimedia.org/goto/6AhgKjsNg?orgId=1
I looked at trying to silence all of the alerts before merging the patch, but it seemed too tricky.
Wed, Jul 9
Hi @CDanis - Just to let you know, the cephosd cluster in codfw is now up and running, so if you want to deploy your ceph-csi-rbd plugin there, you are welcome.
T374923: Bring cephosd200[1-3] into service as a new cluster in codfw is complete.
It's there now, I just uploaded a bigger file with s3cmd and it created the bucket.
Then I set the correct crush rule.
btullis@cephosd2001:~$ sudo ceph osd pool ls .rgw.root .mgr dse-k8s-csi-ssd aux-k8s-csi-rbd-ssd cephfs.dpe.meta cephfs.dpe.data-ssd cephfs.dpe.data-hdd codfw.rgw.log codfw.rgw.control codfw.rgw.meta codfw.rgw.buckets.index codfw.rgw.buckets.data codfw.rgw.buckets.non-ec btullis@cephosd2001:~$ sudo ceph osd pool set codfw.rgw.buckets.non-ec crush_rule hdd set pool 16 crush_rule to hdd
Checking all of the rules.
btullis@cephosd2001:~$ for p in $(sudo ceph osd pool ls); do echo -n "$p :" ; sudo ceph osd pool get $p crush_rule; done .rgw.root :crush_rule: ssd .mgr :crush_rule: ssd dse-k8s-csi-ssd :crush_rule: ssd aux-k8s-csi-rbd-ssd :crush_rule: ssd cephfs.dpe.meta :crush_rule: ssd cephfs.dpe.data-ssd :crush_rule: ssd cephfs.dpe.data-hdd :crush_rule: hdd codfw.rgw.log :crush_rule: ssd codfw.rgw.control :crush_rule: ssd codfw.rgw.meta :crush_rule: ssd codfw.rgw.buckets.index :crush_rule: ssd codfw.rgw.buckets.data :crush_rule: hdd codfw.rgw.buckets.non-ec :crush_rule: hdd
This looks OK. I think that we can call this ticket done, for now. There will be some more work when we get around to connecting up the dse-k8s-codfw cluster to it, but I think we can say that this is ready for use.
I had to rename the zonegroup from dpe_zg to dpe as I had done here: T374447#10144310
I have created the zone.
btullis@cephosd2001:~$ sudo radosgw-admin zone create --rgw-zonegroup=dpe_zg --rgw-zone=codfw --master --default --endpoints=https://rgw.codfw.dpe.anycast.wmnet { "id": "19d9bb4a-2a8b-41be-92fa-53eed71f5254", "name": "codfw", "domain_root": "codfw.rgw.meta:root", "control_pool": "codfw.rgw.control", "gc_pool": "codfw.rgw.log:gc", "lc_pool": "codfw.rgw.log:lc", "log_pool": "codfw.rgw.log", "intent_log_pool": "codfw.rgw.log:intent", "usage_log_pool": "codfw.rgw.log:usage", "roles_pool": "codfw.rgw.meta:roles", "reshard_pool": "codfw.rgw.log:reshard", "user_keys_pool": "codfw.rgw.meta:users.keys", "user_email_pool": "codfw.rgw.meta:users.email", "user_swift_pool": "codfw.rgw.meta:users.swift", "user_uid_pool": "codfw.rgw.meta:users.uid", "otp_pool": "codfw.rgw.otp", "system_key": { "access_key": "", "secret_key": "" }, "placement_pools": [ { "key": "default-placement", "val": { "index_pool": "codfw.rgw.buckets.index", "storage_classes": { "STANDARD": { "data_pool": "codfw.rgw.buckets.data" } }, "data_extra_pool": "codfw.rgw.buckets.non-ec", "index_type": 0, "inline_data": true } } ], "realm_id": "72b51936-86d4-4656-8065-c8ed942ddf47", "notif_pool": "codfw.rgw.log:notif" }
Now moving on to setting up the radosgw side of things. Referring back to here: T330152#10077357
Following what was done here for the cephfs file system: T376405#10206339
I have created the two pools for use with RBD and the CSI interfaces.
btullis@cephosd2001:~$ sudo ceph osd pool create dse-k8s-csi-ssd 800 800 replicated ssd --autoscale-mode=on pool 'dse-k8s-csi-ssd' created btullis@cephosd2001:~$ sudo ceph osd pool create aux-k8s-csi-rbd-ssd 800 800 replicated ssd --autoscale-mode=on pool 'aux-k8s-csi-rbd-ssd' created
Enabled these pools for use with the rbd application.
btullis@cephosd2001:~$ sudo ceph osd pool application enable dse-k8s-csi-ssd rbd enabled application 'rbd' on pool 'dse-k8s-csi-ssd' btullis@cephosd2001:~$ sudo ceph osd pool application enable aux-k8s-csi-rbd-ssd rbd enabled application 'rbd' on pool 'aux-k8s-csi-rbd-ssd'
Configured them to use the ssd crush rule.
btullis@cephosd2001:~$ sudo ceph osd pool set dse-k8s-csi-ssd crush_rule ssd set pool 6 crush_rule to ssd btullis@cephosd2001:~$ sudo ceph osd pool set aux-k8s-csi-rbd-ssd crush_rule ssd set pool 7 crush_rule to ssd
Creating the required crush rules.
btullis@cephosd2001:~$ sudo ceph osd crush rule create-replicated hdd default host hdd btullis@cephosd2001:~$ sudo ceph osd crush rule create-replicated ssd default host ssd btullis@cephosd2001:~$ sudo ceph osd crush rule ls replicated_rule hdd ssd
I can't yet delete the replicated_rule because it is still in use.
Now creating the crush maps, so that get row and rack awareness, as well as host awareness. This is similar to the way it was done in T326945#9074454
The cluster is up and running now.
Metrics are available in Grafana.
In order to bootstrap the cluster, I used the same technique that was used in eqiad. T330149#8705054
Namely, I created a monmap file like this, on each of the three servers:
monmaptool --create --fsid 8e69717a-518b-4c00-9f96-0635d9b913c6 --add cephosd2001 10.192.9.17 --add cephosd2002 10.192.26.19 --add cephosd2003 10.192.37.16 --enable-all-features --set-min-mon-release reef monmap
I created a temporary keyring by concatentating the keyrings for mon. and client.admin into a temp file:
Making good progress on this now.
Tue, Jul 8
I have back-filled to clouddumps100[1-2] with commands like this.
dumpsgen@dumpsdata1003:/data/otherdumps/wikitech$ rsync -av ./ clouddumps1001.wikimedia.org::data/xmldatadumps/public/other/wikitech/ sending incremental file list ./ labswiki-20250701.xml.gz labswiki-20250702.xml.gz labswiki-20250703.xml.gz labswiki-20250704.xml.gz labswiki-20250705.xml.gz labswiki-20250706.xml.gz labswiki-20250707.xml.gz labswiki-20250708.xml.gz
Apologies, this was an oversight on my part, as I had assumed that the standard XML/SQL dumps for labswiki would be sufficient.
I didn't know that these additional dumps were being consumed in order to keep https://wikitech-static.wikimedia.org up-to-date.
The dump of enwiki on snapshot1012 has completed successfully.
btullis@snapshot1012:~$ tail -f /mnt/dumpsdata/xmldatadumps/private/enwiki/20250701/dumplog.txt 2025-07-08 03:58:43: enwiki Reading enwiki-20250701-pages-articles-multistream.xml.bz2 checksum for md5 from file /mnt/dumpsdata/xmldatadumps/public/enwiki/20250701/md5sums-enwiki-20250701-pages-articles-multistream.xml.bz2.txt 2025-07-08 03:58:43: enwiki Reading enwiki-20250701-pages-articles-multistream.xml.bz2 checksum for sha1 from file /mnt/dumpsdata/xmldatadumps/public/enwiki/20250701/sha1sums-enwiki-20250701-pages-articles-multistream.xml.bz2.txt 2025-07-08 03:58:43: enwiki Checkdir dir /mnt/dumpsdata/xmldatadumps/public/enwiki/latest ... 2025-07-08 03:58:43: enwiki Checkdir dir /mnt/dumpsdata/xmldatadumps/public/enwiki/latest ... 2025-07-08 03:58:43: enwiki adding rss feed file /mnt/dumpsdata/xmldatadumps/public/enwiki/latest/enwiki-latest-pages-articles-multistream-index.txt.bz2-rss.xml 2025-07-08 03:58:43: enwiki Reading enwiki-20250701-pages-articles-multistream-index.txt.bz2 checksum for md5 from file /mnt/dumpsdata/xmldatadumps/public/enwiki/20250701/md5sums-enwiki-20250701-pages-articles-multistream-index.txt.bz2.txt 2025-07-08 03:58:43: enwiki Reading enwiki-20250701-pages-articles-multistream-index.txt.bz2 checksum for sha1 from file /mnt/dumpsdata/xmldatadumps/public/enwiki/20250701/sha1sums-enwiki-20250701-pages-articles-multistream-index.txt.bz2.txt 2025-07-08 03:59:19: enwiki Checkdir dir /mnt/dumpsdata/xmldatadumps/public/enwiki/latest ... 2025-07-08 03:59:37: enwiki Checkdir dir /mnt/dumpsdata/xmldatadumps/public/enwiki/latest ... 2025-07-08 04:02:34: enwiki SUCCESS: done.
I'll carry out another manual sync from dumpsdata1006 to clouddumps100[1-2] to get it published.
Mon, Jul 7
Removed the user's local files.
btullis@cumin1003:~$ sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -rf /home/xiaoxiao' 13 hosts will be targeted: an-coord[1003-1004].eqiad.wmnet,an-launcher1002.eqiad.wmnet,an-master[1003-1004].eqiad.wmnet,an-test-client1002.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-test-master[1001-1002].eqiad.wmnet,stat[1008-1011].eqiad.wmnet OK to proceed on 13 hosts? Enter the number of affected hosts to confirm or "q" to quit: 13 ===== NO OUTPUT ===== PASS |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (13/13) [00:11<00:00, 1.12hosts/s] FAIL | | 0% (0/13) [00:11<?, ?hosts/s] 100.0% (13/13) success ratio (>= 100.0% threshold) for command: 'rm -rf /home/xiaoxiao'. 100.0% (13/13) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. btullis@cumin1003:~$
Removed HDFS home directory.
btullis@an-launcher1002:~$ sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/xiaoxiao 25/07/07 15:51:03 INFO fs.TrashPolicyDefault: Moved: 'hdfs://analytics-hadoop/user/xiaoxiao' to trash at: hdfs://analytics-hadoop/user/hdfs/.Trash/Current/user/xiaoxiao btullis@an-launcher1002:~$
The manual sync run has now completed.
This has now finished, so I think that means the whole sqoop process has finished successfully.
analytics@an-launcher1002:/home/btullis$ /usr/local/bin/refinery-sqoop-mediawiki-production-not-history analytics@an-launcher1002:/home/btullis$ echo $? 0
Resetting the failed systemd service.
btullis@an-launcher1002:~$ systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● refinery-sqoop-whole-mediawiki.service loaded failed failed Schedules sqoop to import whole MediaWiki databases into Hadoop monthly.