Skip to content

feat: Add configurable liveness probe #1972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

davidspek
Copy link

@davidspek davidspek commented Jul 25, 2022

Closes: #58, #1703

This PR adds a new field to the Operator configuration parameters and the Postgres spec allowing users too specify a liveness probe for the Spilo container. This is necessary too resolve errors with Postgres going down if communication with the K8s API temporarily fails, as described in this issue.

The Postgres CRD managed by the operator doesn't seem to have the new field added, trying to fix that still. Also, it doesn't seem like the statefulset is updated when the liveness probe is changed which still needs fixing.

The CRD managed by the operator now includes the new field and changes to the liveness and readiness probes now properly update the statefulset.

@davidspek davidspek changed the title Add configurable liveness probe feat: Add configurable liveness probe Aug 1, 2022
@FxKu
Copy link
Member

FxKu commented Aug 23, 2022

LivenessProbes are dangerous. 🔥 🔥 🔥
I'm fine with a toggle so that anybody, who wants to risk it, can go with them. But, I'd prefer just one toggle and the operator creating a probe with a setup we think works for most cases. Like I proposed it for the ReadinessProbe here #2004 . I don't want to have a full spec option in the CRD manifests. Chances of misuse would be even higher then.

@FxKu FxKu added this to the 1.9 milestone Aug 23, 2022
@monotek
Copy link

monotek commented Aug 24, 2022

Imho the point of the article is not that Liveness probes are dangerous per se. You just have to use them right.

I even would go so far to say that a deployment which can't properly crash and restart automatically because of a missing liveness probe is just broken, because that's one of the essential things using a container in kubernetes.

Therefore it should be configurable at least even if not used by default.

Currently we have to use an ugly workaround to get our postgres instances running reliably: https://github.com/monotek/patroni-selfheal

@FxKu
Copy link
Member

FxKu commented Aug 29, 2022

I would still favor one pre-configured liveness probe that works for the operator and have only one toggle for users to switch.

@davidspek
Copy link
Author

We actually developed this as we didn’t want to use patroni-selfheal, and have been using it in prod on multiple cluster for 2 months now without issues and solving the problem we had. The main problem with a single liveness probe toggle is that it will likely not work in many scenarios, similarly to the readiness probe. One example is if a replica needs to clone a very large DB from the master, the probes would need to be configurable for this use case specifically.
We were actually planning on making the readiness probe configurable as well because of this.

In terms of errors configuring the probes, they have the same type and validation as Kubernetes so that should take care of part of it. We can also add docs, examples, etc to inform people how to configure the probe and give a good starting point they can use.

@davidspek
Copy link
Author

@FxKu friendly bump since I believe this is a pretty essential feature for the operator.

@FxKu FxKu modified the milestones: 1.9, 2.0 Dec 14, 2022
davidspek added 3 commits May 16, 2023 11:17
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
@davidspek davidspek requested a review from hughcapet as a code owner May 16, 2023 09:17
@davidspek
Copy link
Author

@FxKu Friendly bump since I've just rebased on the latest release.

@aleskandro
Copy link

Hi, having this PR and feature merged seems a very good improvement. Any luck that it will happen?

@FxKu FxKu modified the milestones: 1.11.0, 2.0 Jan 24, 2024
@theBNT
Copy link

theBNT commented Jul 18, 2025

still valid :>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Open Questions
Development

Successfully merging this pull request may close these issues.

5 participants