-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
- Which image of the operator are you using? e.g. ghcr.io/zalando/postgres-operator:v1.12.2
- Where do you run it - cloud or metal? Kubernetes
- Are you running Postgres Operator in production? yes
- Type of issue? question
Hi, We are using SSL S3 for backup and storing the WALs. We have added WALG_S3_CA_CERT_FILE in env(WALG_S3_CA_CERT_FILE=/tlsca/ca.crt).
It is working fine in the backup and WAL push, but in the clone/standby cluster, the S3 connection failed. I think the env is not present while bootstrapping, not sure.backup-list command is failing when we not having WALG_S3_CA_CERT_FILE in env vars.
Any idea how to fix this issue or skip the ssl verify? Thanks in advance.
$ envdir "/run/etc/wal-e.d/env" wal-g backup-list
name modified wal_segment_backup_start
base_00000003000000000000000A 2025-07-14T04:39:33Z 00000003000000000000000A
base_00000004000000000000000D 2025-07-14T05:00:46Z 00000004000000000000000D
base_000000040000000000000011 2025-07-14T05:07:50Z 000000040000000000000011
base_00000006000000000000005E 2025-07-14T05:36:20Z 00000006000000000000005E
base_000000070000000000000061 2025-07-14T06:01:41Z 000000070000000000000061
$ env |grep -i WALG_S3_CA_CERT_FILE
WALG_S3_CA_CERT_FILE=/tlsca/ca.crt
$
Clone cluster logs :
025-07-16 09:35:58,858 INFO: not healthy enough for leader race
2025-07-16 09:35:58,863 INFO: bootstrap in progress
2025-07-16 09:36:08,858 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:36:08,858 INFO: not healthy enough for leader race
2025-07-16 09:36:08,865 INFO: bootstrap in progress
ERROR: 2025/07/16 09:36:18.722623 failed to list s3 folder: 'spilo/pg-pgjul14testpg15/e23c4f04-279b-4c36-b8f1-4ffe40f5f89c/wal/15/basebackups_005/': RequestError: send request failed
caused by: Get "https://bkpcohprdsasysctrl.lowes.com:3000/postgres_dev_sadc?delimiter=%2F&list-type=2&prefix=spilo%2Fpg-pgjul14testpg15%2Fe23c4f04-279b-4c36-b8f1-4ffe40f5f89c%2Fwal%2F15%2Fbasebackups_005%2F": tls: failed to verify certificate: x509: certificate signed by unknown authority
2025-07-16 09:36:18,726 ERROR: Clone failed
Traceback (most recent call last):
File "/scripts/clone_with_wale.py", line 185, in main
run_clone_from_s3(options)
File "/scripts/clone_with_wale.py", line 166, in run_clone_from_s3
backup_name, update_envdir = find_backup(options.recovery_target_time, env)
File "/scripts/clone_with_wale.py", line 150, in find_backup
backup_list = list_backups(env)
File "/scripts/clone_with_wale.py", line 84, in list_backups
output = subprocess.check_output(backup_list_cmd, env=env)
File "/usr/lib/python3.10/subprocess.py", line 420, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['wal-g', 'backup-list']' returned non-zero exit status 1.
2025-07-16 09:36:18,747 INFO: removing initialize key after failed attempt to bootstrap the cluster
Traceback (most recent call last):
File "/usr/local/bin/patroni", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 144, in main
return patroni_main()
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 136, in patroni_main
abstract_main(Patroni, schema)
File "/usr/local/lib/python3.10/dist-packages/patroni/daemon.py", line 108, in abstract_main
controller.run()
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 106, in run
super(Patroni, self).run()
File "/usr/local/lib/python3.10/dist-packages/patroni/daemon.py", line 65, in run
self._run_cycle()
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 109, in _run_cycle
logger.info(self.ha.run_cycle())
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1771, in run_cycle
info = self._run_cycle()
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1593, in _run_cycle
return self.post_bootstrap()
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1484, in post_bootstrap
self.cancel_initialization()
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1477, in cancel_initialization
raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
/etc/runit/runsvdir/default/patroni: finished with code=1 signal=0
/etc/runit/runsvdir/default/patroni: sleeping 30 seconds
2025-07-16 09:36:49,805 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-07-16 09:36:49,825 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:36:49,836 INFO: trying to bootstrap a new cluster
2025-07-16 09:36:49,836 INFO: Running custom bootstrap script: envdir "/run/etc/wal-e.d/env-clone-pg-pgjul14testpg15" python3 /scripts/clone_with_wale.py --recovery-target-time="2025-07-14T08:30:00+00:00"
2025-07-16 09:36:50,132 INFO: Trying s3://postgres_dev_sadc/spilo/pg-pgjul14testpg15/e23c4f04-279b-4c36-b8f1-4ffe40f5f89c/wal/15/ for clone
2025-07-16 09:36:59,818 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:36:59,818 INFO: not healthy enough for leader race
2025-07-16 09:36:59,824 INFO: bootstrap in progress
2025-07-16 09:37:09,819 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:37:09,819 INFO: not healthy enough for leader race