Skip to content

SSL S3 connection failing for clone and standby cluster #2935

@Ajith61

Description

@Ajith61
  • Which image of the operator are you using? e.g. ghcr.io/zalando/postgres-operator:v1.12.2
  • Where do you run it - cloud or metal? Kubernetes
  • Are you running Postgres Operator in production? yes
  • Type of issue? question

Hi, We are using SSL S3 for backup and storing the WALs. We have added WALG_S3_CA_CERT_FILE in env(WALG_S3_CA_CERT_FILE=/tlsca/ca.crt).

It is working fine in the backup and WAL push, but in the clone/standby cluster, the S3 connection failed. I think the env is not present while bootstrapping, not sure.backup-list command is failing when we not having WALG_S3_CA_CERT_FILE in env vars.

Any idea how to fix this issue or skip the ssl verify? Thanks in advance.

$ envdir "/run/etc/wal-e.d/env" wal-g backup-list
name modified wal_segment_backup_start
base_00000003000000000000000A 2025-07-14T04:39:33Z 00000003000000000000000A
base_00000004000000000000000D 2025-07-14T05:00:46Z 00000004000000000000000D
base_000000040000000000000011 2025-07-14T05:07:50Z 000000040000000000000011
base_00000006000000000000005E 2025-07-14T05:36:20Z 00000006000000000000005E
base_000000070000000000000061 2025-07-14T06:01:41Z 000000070000000000000061
$ env |grep -i WALG_S3_CA_CERT_FILE
WALG_S3_CA_CERT_FILE=/tlsca/ca.crt
$

Clone cluster logs :

025-07-16 09:35:58,858 INFO: not healthy enough for leader race
2025-07-16 09:35:58,863 INFO: bootstrap in progress
2025-07-16 09:36:08,858 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:36:08,858 INFO: not healthy enough for leader race
2025-07-16 09:36:08,865 INFO: bootstrap in progress
ERROR: 2025/07/16 09:36:18.722623 failed to list s3 folder: 'spilo/pg-pgjul14testpg15/e23c4f04-279b-4c36-b8f1-4ffe40f5f89c/wal/15/basebackups_005/': RequestError: send request failed
caused by: Get "https://bkpcohprdsasysctrl.lowes.com:3000/postgres_dev_sadc?delimiter=%2F&list-type=2&prefix=spilo%2Fpg-pgjul14testpg15%2Fe23c4f04-279b-4c36-b8f1-4ffe40f5f89c%2Fwal%2F15%2Fbasebackups_005%2F": tls: failed to verify certificate: x509: certificate signed by unknown authority

2025-07-16 09:36:18,726 ERROR: Clone failed
Traceback (most recent call last):
File "/scripts/clone_with_wale.py", line 185, in main
run_clone_from_s3(options)
File "/scripts/clone_with_wale.py", line 166, in run_clone_from_s3
backup_name, update_envdir = find_backup(options.recovery_target_time, env)
File "/scripts/clone_with_wale.py", line 150, in find_backup
backup_list = list_backups(env)
File "/scripts/clone_with_wale.py", line 84, in list_backups
output = subprocess.check_output(backup_list_cmd, env=env)
File "/usr/lib/python3.10/subprocess.py", line 420, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['wal-g', 'backup-list']' returned non-zero exit status 1.
2025-07-16 09:36:18,747 INFO: removing initialize key after failed attempt to bootstrap the cluster
Traceback (most recent call last):
File "/usr/local/bin/patroni", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 144, in main
return patroni_main()
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 136, in patroni_main
abstract_main(Patroni, schema)

File "/usr/local/lib/python3.10/dist-packages/patroni/daemon.py", line 108, in abstract_main
controller.run()
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 106, in run
super(Patroni, self).run()
File "/usr/local/lib/python3.10/dist-packages/patroni/daemon.py", line 65, in run
self._run_cycle()
File "/usr/local/lib/python3.10/dist-packages/patroni/main.py", line 109, in _run_cycle
logger.info(self.ha.run_cycle())
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1771, in run_cycle
info = self._run_cycle()
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1593, in _run_cycle
return self.post_bootstrap()
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1484, in post_bootstrap
self.cancel_initialization()
File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 1477, in cancel_initialization
raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
/etc/runit/runsvdir/default/patroni: finished with code=1 signal=0
/etc/runit/runsvdir/default/patroni: sleeping 30 seconds
2025-07-16 09:36:49,805 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-07-16 09:36:49,825 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:36:49,836 INFO: trying to bootstrap a new cluster
2025-07-16 09:36:49,836 INFO: Running custom bootstrap script: envdir "/run/etc/wal-e.d/env-clone-pg-pgjul14testpg15" python3 /scripts/clone_with_wale.py --recovery-target-time="2025-07-14T08:30:00+00:00"
2025-07-16 09:36:50,132 INFO: Trying s3://postgres_dev_sadc/spilo/pg-pgjul14testpg15/e23c4f04-279b-4c36-b8f1-4ffe40f5f89c/wal/15/ for clone
2025-07-16 09:36:59,818 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:36:59,818 INFO: not healthy enough for leader race
2025-07-16 09:36:59,824 INFO: bootstrap in progress
2025-07-16 09:37:09,819 INFO: Lock owner: None; I am pg-pgjul14testpg15pitr-0
2025-07-16 09:37:09,819 INFO: not healthy enough for leader race

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions