SREGroup
ActivePublic
Watch Project

Members (31)

Ladsgroup (Amir Sarabadani)
Shah of Bugs, Emir of database architecture, World-renowned rubber duck
ABran-WMF (arnaudb)
SRE
andrea.denisse (denisse)
Animal
cmooney (Cathal Mooney)
SRE (netops)
RLazarus (Reuven Lazarus) (rzl)
User
Devnull (Mark Rosenbaum)
Infrastructure Engineer
lmata (Leo Mata)
SRE
DubOSv10 (DubOSv10)
User
colewhite (cwhite)
User
AndyTan (AndyTan)
User
View All

Watchers (14)

kwabena_fosu (Nathaniel Ofosu Asiedu)
User
Wunderlandmeli
User
ReaperDawn (Vulphere)
User
Muchiri124 (Edwin Muchiri Kibuti)
User
Devnull (Mark Rosenbaum)
Infrastructure Engineer
Techguru.pc (Paul Charlton)
Tech Guru
Legado_Shulgin (psiconauta)
User
Wong128hk (JWong)
from zhwikis
Davinaclare77
User
Hfbn0 (Hfbn0)
User
View All

Details

Description

Homepage: https://www.mediawiki.org/wiki/Wikimedia_Site_Reliability_Engineering

If you're filing a task that needs to be handled by the SRE team, add a tag from the list below. If you're not sure which one to pick, add the SRE tag and the on-duty person will redirect it as appropriate.

Cross-team tags:

Tags by team:

DC-Ops
- procurement
- ops-codfw, ops-eqiad, ops-eqsin, ops-esams, ops-ulsfo, ops-eqord, ops-eqdfw, ops-magru
Traffic
- Domains
- DNS
- PyBal
- Traffic-Icebox (do not use for new tasks)
Data-Persistence
serviceops
collaboration-services
- GitLab
- Gerrit
- Miscweb
- Phabricator
- VRTS / Znuny
Infrastructure-Foundations
- SRE-tools
- Mail
- Packaging
- netops
- netbox
- homer
- Puppet
- Puppet CI
- Puppet-Core
- Puppet-Infrastructure
- CAS-SSO
- CFSSL-PKI
- vm-requests
- Infrastructure Security
- Keyholder
SRE Observability
- observability (do not use for new tasks)
- Observability-Alerting
- Observability-Logging
- Observability-Metrics
- Observability-Tracing
- Pontoon

Recent Activity
View All

Today

Maintenance_bot added a project to T399916: Inbound errors on interface cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}): SRE.

Fri, Jul 18, 2:29 AM · SRE, DC-Ops, ops-codfw

Yesterday

Stashbot added a comment to T399221: eqsin purged consumers lag.

Mentioned in SAL (#wikimedia-operations) [2025-07-17T22:01:29Z] <cmooney@cumin1003> END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: repool eqsin to test backhaul cct packet loss, T399221]

Thu, Jul 17, 10:01 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

Stashbot added a comment to T399221: eqsin purged consumers lag.

Mentioned in SAL (#wikimedia-operations) [2025-07-17T22:01:25Z] <cmooney@cumin1003> START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: repool eqsin to test backhaul cct packet loss, T399221]

Thu, Jul 17, 10:01 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

aranyap added a comment to T398650: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum.

@cmooney @ssingh I just requested access through the online system. Thank you!

Thu, Jul 17, 9:31 PM · SRE, SRE-Access-Requests

ssingh claimed T399899: Requesting access to analytics-privatedata-users for resquito.

Thu, Jul 17, 9:16 PM · SRE, SRE-Access-Requests

ssingh added a comment to T398650: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum.

[Claiming this as the clinic duty person this week]

Thu, Jul 17, 9:13 PM · SRE, SRE-Access-Requests

cmooney added a comment to T398650: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum.

Hi @aranyap yeah you are not in that group.

cmooney@ldap-maint1001:~$ ldapsearch -x cn=wmf | grep aprum 
cmooney@ldap-maint1001:~$

Thu, Jul 17, 8:58 PM · SRE, SRE-Access-Requests

aranyap reopened T398650: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum as "Open".

Hi @cmooney ! I'm having some trouble trying to access JupyterHub and after some poking around with @dr0ptp4kt and @BTullis we think it's because I don't have wmf LDAP access but we aren't 100% sure.

Thu, Jul 17, 8:53 PM · SRE, SRE-Access-Requests

Jclark-ctr moved T399847: Degraded RAID on backup1007 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.

Thu, Jul 17, 8:05 PM · DC-Ops, SRE, ops-eqiad

HShaikh added a comment to T399899: Requesting access to analytics-privatedata-users for resquito.

Approved. Thank you

Thu, Jul 17, 7:58 PM · SRE, SRE-Access-Requests

REsquito-WMF added a comment to T399899: Requesting access to analytics-privatedata-users for resquito.

this ticket is a prerequisite for https://phabricator.wikimedia.org/T396672 and that @dr0ptp4kt is also readying a patch for additional access in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1165605 to be taken out of WIP once your initial SSH access is established.

Thu, Jul 17, 7:52 PM · SRE, SRE-Access-Requests

REsquito-WMF created T399899: Requesting access to analytics-privatedata-users for resquito.

Thu, Jul 17, 7:51 PM · SRE, SRE-Access-Requests

Maintenance_bot added a project to T399878: Archive affiliates-l: SRE.

Thu, Jul 17, 7:29 PM · SRE, Wikimedia-Mailing-lists

VRiley-WMF closed T381576: Q2:rack/setup/install ganeti105[34].eqiad.wmnet as Resolved.

These have been imaged

Thu, Jul 17, 7:05 PM · SRE, ops-eqiad, Infrastructure-Foundations, DC-Ops

VRiley-WMF updated the task description for T381576: Q2:rack/setup/install ganeti105[34].eqiad.wmnet.

Thu, Jul 17, 7:04 PM · SRE, ops-eqiad, Infrastructure-Foundations, DC-Ops

ops-monitoring-bot added a comment to T381576: Q2:rack/setup/install ganeti105[34].eqiad.wmnet.

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host ganeti1054.eqiad.wmnet with OS bookworm completed:

ganeti1054 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced UEFI HTTP Boot for next reboot
- Host rebooted via Redfish
- Host up (Debian installer)
- Add puppet_version metadata (7) to Debian installer
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202507171848_vriley_1152194_ganeti1054.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully

Thu, Jul 17, 7:03 PM · SRE, ops-eqiad, Infrastructure-Foundations, DC-Ops

Eevans added a comment to T215183: Redundant bootloaders for software RAID.

In T215183#11014691, @CDanis wrote:

In T215183#11014363, @Eevans wrote:

[ ... ]

I also never spent much time looking at or thinking about RAID10 hosts, as you said. Honestly I don't remember what debian-installer does in the first place for RAID10 and bootloaders.

Thu, Jul 17, 6:45 PM · Infrastructure-Foundations, SRE

Eevans added a comment to T215183: Redundant bootloaders for software RAID.

In T215183#11014691, @CDanis wrote:

In T215183#11014363, @Eevans wrote:

[ ... ]

For context: We replaced sda in aqs1012 recently (T396970) and were (I believe) bit by this issue. It would seem to have been reimaged since the partman recipe was fixed, and it does not appear in the April 2020 list posted in T215183#6086396, so I'm wondering if a prior replacement didn't get the bootloader installed.

@Eevans, can you check in the BIOS settings of aqs1012 to see if a setting like "Hard drive failover" exists, per T215183#6718961 ?

Thu, Jul 17, 6:30 PM · Infrastructure-Foundations, SRE

ops-monitoring-bot added a comment to T381576: Q2:rack/setup/install ganeti105[34].eqiad.wmnet.

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host ganeti1054.eqiad.wmnet with OS bookworm

Thu, Jul 17, 6:30 PM · SRE, ops-eqiad, Infrastructure-Foundations, DC-Ops

CDanis updated subscribers of T215183: Redundant bootloaders for software RAID.

In T215183#11014363, @Eevans wrote:

Has there been any progress toward goal #2? I didn't see where anything had been added to the mentioned runbook.

Thu, Jul 17, 6:02 PM · Infrastructure-Foundations, SRE

Eevans added a comment to T215183: Redundant bootloaders for software RAID.

As a follow-up, I did find a device with a missing bootloader: aqs1014, which went up after it's partman recipe was fixed (it has had SSDs replaced in the years since though)

Thu, Jul 17, 6:00 PM · Infrastructure-Foundations, SRE

VRiley-WMF updated the task description for T381576: Q2:rack/setup/install ganeti105[34].eqiad.wmnet.

Thu, Jul 17, 5:09 PM · SRE, ops-eqiad, Infrastructure-Foundations, DC-Ops

VRiley-WMF added a comment to T381576: Q2:rack/setup/install ganeti105[34].eqiad.wmnet.

ganeti1054 has moved into A4 U38

Thu, Jul 17, 5:09 PM · SRE, ops-eqiad, Infrastructure-Foundations, DC-Ops

cmooney added a comment to T395910: cloudcephosd10[48-51] service implementation.

Regarding the jumbo-frame complication on the plan to move to one link we are arranging to connect a second 25G on each of these new hosts for the storage vlan. See below tasks:

Thu, Jul 17, 4:58 PM · Patch-For-Review, cloud-services-team, SRE, ops-eqiad, DC-Ops

Eevans added a comment to T215183: Redundant bootloaders for software RAID.

Has there been any progress toward goal #2? I didn't see where anything had been added to the mentioned runbook.

Thu, Jul 17, 4:49 PM · Infrastructure-Foundations, SRE

Eevans added a subtask for T215183: Redundant bootloaders for software RAID: T399875: Cassandra clusters: redundant bootloaders for software RAID followup.

Thu, Jul 17, 4:43 PM · Infrastructure-Foundations, SRE

Jhancock.wm added a comment to T396365: Q4:rack/setup/install sretest2009.

checked the physical cables and everything lines up right. couldn't get into the BMC. re-ran the reqular provisioning script and can access the BMC now. But won't let me set the root password in the script. I can login to the BMC with the one printed on the luggage tag. I'll DM it to you. I don't wanna add the root user if you still need to test on that.

Thu, Jul 17, 4:31 PM · SRE, DC-Ops, ops-codfw

jcrespo added a comment to T399847: Degraded RAID on backup1007.

I've stopped it anyway, if you could start it up again after finishing, it would help me a lot, thank you.

Thu, Jul 17, 4:29 PM · DC-Ops, SRE, ops-eqiad

ops-monitoring-bot added a comment to T381576: Q2:rack/setup/install ganeti105[34].eqiad.wmnet.

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host ganeti1053.eqiad.wmnet with OS bookworm completed:

ganeti1053 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced UEFI HTTP Boot for next reboot
- Host rebooted via Redfish
- Host up (Debian installer)
- Add puppet_version metadata (7) to Debian installer
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202507162234_vriley_643342_ganeti1053.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully

Thu, Jul 17, 4:26 PM · SRE, ops-eqiad, Infrastructure-Foundations, DC-Ops

ops-monitoring-bot added a comment to T399847: Degraded RAID on backup1007.

Icinga downtime and Alertmanager silence (ID=ce8f4e27-d454-43c0-b1b5-892d46c710a6) set by jynus@cumin1003 for 1 day, 0:00:00 on 1 host(s) and their services with reason: failed disk

backup1007.eqiad.wmnet

Thu, Jul 17, 4:25 PM · DC-Ops, SRE, ops-eqiad

cmooney added a comment to T399097: Arelion IC-374549 100G Transport outage (cr1-codfw -> cr1-eqiad) July 2025.

Crickets in the main from Arelion, one update earlier.

2025-07-17 14:08

Thu, Jul 17, 4:21 PM · SRE, DC-Ops, ops-codfw

jcrespo added a comment to T399847: Degraded RAID on backup1007.

This time if fully Failed, so please change it. Do I stop the server first?

Thu, Jul 17, 4:18 PM · DC-Ops, SRE, ops-eqiad

Jclark-ctr updated subscribers of T399847: Degraded RAID on backup1007.

@jcrespo just fyi automated ticket was opened again for this host

Thu, Jul 17, 4:16 PM · DC-Ops, SRE, ops-eqiad

Jclark-ctr claimed T399847: Degraded RAID on backup1007.

Thu, Jul 17, 4:15 PM · DC-Ops, SRE, ops-eqiad

Eevans closed T396970: Degraded RAID on aqs1012 as Resolved.

This is now complete.

Thu, Jul 17, 4:14 PM · DC-Ops, SRE, ops-eqiad

cmooney added a comment to T399221: eqsin purged consumers lag.

In T399221#11014023, @ssingh wrote:

I think leaving eqsin depooled given that it is off-peak there and observing this for a few hours is my vote.

Thu, Jul 17, 3:56 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

ssingh added a comment to T399221: eqsin purged consumers lag.

Arelion want to close the ticket as they see no issue. I asked that they don't. Perhaps for now we just leave eqsin depooled and the circuit in the active traffic path? If the reported RTT remains steady we can re-pool after a reasonable period has elapsed? And if it jumps up we can re-do the iperf tests at the time to try to confirm the packet loss has returned?

Thu, Jul 17, 3:45 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

cmooney added a comment to T399221: eqsin purged consumers lag.

Not sure how to progress this one. Still see zero packet loss over the link, even running for a longer period (5 mins this time):

cmooney@cp5017:~$ iperf -s -i10 -u -w512k
------------------------------------------------------------
Server listening on UDP port 5001
UDP buffer size: 1000 KByte (WARNING: requested  500 KByte)
------------------------------------------------------------
[  3] local 10.132.0.17 port 5001 connected with 10.192.48.35 port 39207
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3] 0.0000-10.0000 sec   125 MBytes   105 Mbits/sec   0.023 ms    0/89169 (0%)
[  3] 10.0000-20.0000 sec   125 MBytes   105 Mbits/sec   0.011 ms    0/89165 (0%)
[  3] 20.0000-30.0000 sec   125 MBytes   105 Mbits/sec   0.009 ms    0/89164 (0%)
[  3] 30.0000-40.0000 sec   125 MBytes   105 Mbits/sec   0.025 ms    0/89165 (0%)
[  3] 40.0000-50.0000 sec   125 MBytes   105 Mbits/sec   0.010 ms    0/89164 (0%)
[  3] 50.0000-60.0000 sec   125 MBytes   105 Mbits/sec   0.014 ms    0/89165 (0%)
[  3] 60.0000-70.0000 sec   125 MBytes   105 Mbits/sec   0.022 ms    0/89165 (0%)
[  3] 70.0000-80.0000 sec   125 MBytes   105 Mbits/sec   0.022 ms    0/89164 (0%)
[  3] 80.0000-90.0000 sec   125 MBytes   105 Mbits/sec   0.015 ms    0/89164 (0%)
[  3] 90.0000-100.0000 sec   125 MBytes   105 Mbits/sec   0.038 ms    0/89161 (0%)
[  3] 100.0000-110.0000 sec   125 MBytes   105 Mbits/sec   0.023 ms    0/89169 (0%)
[  3] 110.0000-120.0000 sec   125 MBytes   105 Mbits/sec   0.031 ms    0/89164 (0%)
[  3] 120.0000-130.0000 sec   125 MBytes   105 Mbits/sec   0.040 ms    0/89166 (0%)
[  3] 130.0000-140.0000 sec   125 MBytes   105 Mbits/sec   0.016 ms    0/89164 (0%)
[  3] 140.0000-150.0000 sec   125 MBytes   105 Mbits/sec   0.018 ms    0/89164 (0%)
[  3] 150.0000-160.0000 sec   125 MBytes   105 Mbits/sec   0.054 ms    0/89166 (0%)
[  3] 160.0000-170.0000 sec   125 MBytes   105 Mbits/sec   0.017 ms    0/89163 (0%)
[  3] 170.0000-180.0000 sec   125 MBytes   105 Mbits/sec   0.032 ms    0/89165 (0%)
[  3] 180.0000-190.0000 sec   125 MBytes   105 Mbits/sec   0.033 ms    0/89165 (0%)
[  3] 190.0000-200.0000 sec   125 MBytes   105 Mbits/sec   0.023 ms    0/89164 (0%)
[  3] 200.0000-210.0000 sec   125 MBytes   105 Mbits/sec   0.016 ms    0/89165 (0%)
[  3] 210.0000-220.0000 sec   125 MBytes   105 Mbits/sec   0.040 ms    0/89165 (0%)
[  3] 220.0000-230.0000 sec   125 MBytes   105 Mbits/sec   0.014 ms    0/89165 (0%)
[  3] 230.0000-240.0000 sec   125 MBytes   105 Mbits/sec   0.021 ms    0/89164 (0%)
[  3] 240.0000-250.0000 sec   125 MBytes   105 Mbits/sec   0.017 ms    0/89165 (0%)
[  3] 250.0000-260.0000 sec   125 MBytes   105 Mbits/sec   0.014 ms    0/89165 (0%)
[  3] 260.0000-270.0000 sec   125 MBytes   105 Mbits/sec   0.020 ms    0/89165 (0%)
[  3] 270.0000-280.0000 sec   125 MBytes   105 Mbits/sec   0.017 ms    0/89163 (0%)
[  3] 280.0000-290.0000 sec   125 MBytes   105 Mbits/sec   0.022 ms    0/89165 (0%)
[  3] 290.0000-300.0000 sec   125 MBytes   105 Mbits/sec   0.021 ms    0/89162 (0%)
[  3] 300.0000-300.0001 sec  2.87 KBytes   367 Mbits/sec   0.035 ms    0/    2 (0%)
[  3] 0.0000-300.0001 sec  3.66 GBytes   105 Mbits/sec   0.035 ms    0/2674942 (0%)

Thu, Jul 17, 3:42 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

cmooney reopened T394333: Q4:rack/setup/install cloudcephosd10[48-51] as "Open".

@Jclark-ctr as discussed in our call on Tuesday we will be connecting the second SFP port on these hosts to the switches too, as we need to solve the MTU issue before proceeding with T399180: Cloudcephosd: migrate to single network uplink.

Thu, Jul 17, 3:39 PM · Patch-For-Review, SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops

Stashbot added a comment to T399221: eqsin purged consumers lag.

Mentioned in SAL (#wikimedia-operations) [2025-07-17T15:28:12Z] <topranks> un-drain Arelion transport circuit from codfw -> eqsin to test performance T399221

Thu, Jul 17, 3:28 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

Stashbot added a comment to T399221: eqsin purged consumers lag.

Mentioned in SAL (#wikimedia-operations) [2025-07-17T14:38:23Z] <cmooney@cumin1003> END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: depool eqsin to test backhaul cct packet loss, T399221]

Thu, Jul 17, 2:38 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

Stashbot added a comment to T399221: eqsin purged consumers lag.

Mentioned in SAL (#wikimedia-operations) [2025-07-17T14:38:19Z] <cmooney@cumin1003> START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: depool eqsin to test backhaul cct packet loss, T399221]

Thu, Jul 17, 2:38 PM · SRE, DC-Ops, ops-codfw, Infrastructure-Foundations, netops, Traffic

brouberol moved T399355: Degraded RAID on an-worker1175 from Blocked/Waiting to Done on the Data-Platform-SRE (2025.07.05 - 2025.07.25) board.

Thu, Jul 17, 2:17 PM · Data-Platform-SRE (2025.07.05 - 2025.07.25), SRE, DC-Ops, ops-eqiad

ops-monitoring-bot created T399847: Degraded RAID on backup1007.

Thu, Jul 17, 1:53 PM · DC-Ops, SRE, ops-eqiad

elukey added a comment to T393948: Q4:rack/setup/install ml-serve101[2345].

Ok so I have a provision script change that seems to work, but it doesn't touch anything on the network PXE / FixedBootOrder config (except ensuring that UEFI Hdd is the first).

Thu, Jul 17, 1:43 PM · SRE, Machine-Learning-Team, ops-eqiad, DC-Ops

gerritbot added a comment to T398613: Investigate dead an-worker host an-worker1176.

Change #1170301 merged by Stevemunene:

[operations/puppet@production] hdfs: Add an-worker 1176|1179|1186 to analytics cluster

https://gerrit.wikimedia.org/r/1170301

Thu, Jul 17, 12:30 PM · Patch-For-Review, Data-Platform-SRE (2025.06.13 - 2025.07.04), SRE, ops-eqiad, DC-Ops

Jclark-ctr closed T399671: Degraded RAID on backup1007 as Resolved.

Updated Firmware on idrac while logged in thanks for assistance @jcrespo

Thu, Jul 17, 12:21 PM · SRE, DC-Ops, ops-eqiad

jcrespo added a comment to T399671: Degraded RAID on backup1007.

Note my prediction is that we will need 3 new disks, not only 1 to be replaced (but this can be resolve for now).

Thu, Jul 17, 12:20 PM · SRE, DC-Ops, ops-eqiad

jcrespo added a comment to T399671: Degraded RAID on backup1007.

I told @Jclark-ctr not to replace the 13th disk yet, as I was more worried about the jbod ones than the RAID:

root@backup1007:~$ megacli -PDList -aall | grep rro
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 6
Other Error Count: 1
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 1339
Media Error Count: 0
Other Error Count: 1339

Thu, Jul 17, 12:18 PM · SRE, DC-Ops, ops-eqiad

Jclark-ctr closed T399355: Degraded RAID on an-worker1175 as Resolved.

Replaced Failed Drive Thanks for the assistance with this @BTullis

Thu, Jul 17, 11:50 AM · Data-Platform-SRE (2025.07.05 - 2025.07.25), SRE, DC-Ops, ops-eqiad

SREGroupActivePublicWatch Project

Members (31)

Watchers (14)

Details

Recent ActivityView All

Today

Yesterday

SREGroup
ActivePublic
Watch Project

Recent Activity
View All