Skip to content

K8SPG-804: Fix internal.percona.com/delete-backup finalizer #1182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 30, 2025

Conversation

pooknull
Copy link
Contributor

@pooknull pooknull commented Jun 24, 2025

K8SPG-804 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPG-804

DESCRIPTION

Problem:
On rare occasions, after executing the finalizer internal.percona.com/delete-backup, operator creates a second backup job for single pg-backup resource. After that, it is not possible to start new backups

Cause:
The internal.percona.com/keep-job finalizer tries to delete the postgres-operator.crunchydata.com/pgbackrest-backup annotation from the postgrescluster. Afterwards, it deletes the labels from the backup job so that they are not included in the repoResources.manualBackupJobs used in the reconcileManualBackup method.

If the job is included in reconcileManualBackup, it gets stuck at https://github.com/percona/percona-postgresql-operator/blob/02cb95c196a93ff03ce044[...]af1f1fe9f0505/internal/controller/postgrescluster/pgbackrest.go, because the finalizer internal.percona.com/keep-job is present in the backup job.
The reconcileManualBackup method creates a new backup job if there is no backup job with these labels and the postgres-operator.crunchydata.com/pgbackrest-backup annotation is specified.

However, on rare occasions, crunchy's reconcile process can have an outdated version of postgrescluster with this annotation specified when it shouldn't be, while having the latest state of the jobs. In this case, the method doesn't find any backup jobs with labels, but it does see the annotation and creates a new backup job which the operator doesn't control. It is expected that backup labels will be deleted from the backup job after each backup.

Solution:
Fetch the latest state of the postgrescluster cluster in the reconcileManualBackup method.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

@pooknull pooknull changed the title Fix internal.percona.com/delete-backup finalizer K8SPG-804: Fix internal.percona.com/delete-backup finalizer Jun 27, 2025
@pooknull pooknull marked this pull request as ready for review June 27, 2025 08:23
egegunes
egegunes previously approved these changes Jun 27, 2025
gkech
gkech previously approved these changes Jun 27, 2025
@pooknull pooknull dismissed stale reviews from egegunes and gkech via 362c676 June 30, 2025 16:20
@JNKPercona
Copy link
Collaborator

Test Name Result Time
backup-enable-disable passed 00:06:42
custom-extensions passed 00:08:14
custom-tls passed 00:05:16
demand-backup passed 00:25:24
finalizers passed 00:03:36
init-deploy passed 00:02:59
monitoring passed 00:07:33
monitoring-pmm3 passed 00:07:22
one-pod passed 00:05:28
operator-self-healing passed 00:07:43
pitr passed 00:11:15
scaling passed 00:04:43
scheduled-backup passed 00:26:16
self-healing passed 00:08:30
sidecars passed 00:02:25
start-from-backup passed 00:12:45
tablespaces passed 00:07:18
telemetry-transfer passed 00:03:11
upgrade-consistency passed 00:05:21
upgrade-minor passed 00:06:25
users passed 00:03:34
We run 21 out of 21 02:52:10

commit: 6e5dec0
image: perconalab/percona-postgresql-operator:PR-1182-6e5dec01f

@hors hors merged commit d29085b into main Jun 30, 2025
18 checks passed
@hors hors deleted the backup-in-progress-fix branch June 30, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants