Skip to content

Merge in changes made by NGINXaaS for Azure project #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 114 commits into
base: main
Choose a base branch
from

Conversation

arussellf5
Copy link

@arussellf5 arussellf5 commented Jul 18, 2025

This PR adds changes made by the NGINXaaS for Azure project to this repo.

puneetsarna and others added 30 commits July 17, 2025 17:28
The primary intent behind this is to keep retrying updates which may be made
before the controlplane has registered the existence of a named upstream in
the customer's NGINX configuration.
NOTE: This commit was accidentally missed out in
the first iteration of this fork and is present in
upstream: https://github.com/nginxinc/nginx-loadbalancer-kubernetes/blob/main/internal/configuration/settings.go#L189

The code needs to move forward to start the
watchers and health server (side note: we should
also think about the ordering of these at some
point) and not running the infomer in a go routine
prevents the program from further execution until
the context is canceled.

In the current iteration on main, the controller
is stuck on the informer, and then k8s kills the
service and restarts it since the health server is
not up.
The user specifies the ingress service whose events the application should watch through setting the "service-annotation-match" annotation on the application's config map. Only events with a matching service annotation will be passed by the watcher to the handlers. The informer now listens to events from all namespaces. This frees the end user from the restriction of only using the nginx ingress controller.
…eam name

The port name should now be formatted like this: "http-tea", where the first part
of the string is the context type (either "http" or "stream") and the second part
of the string after the hyphen is the name of the upstream.
We need to be able to publish the operator to
dockerhub in order to be publicly available for
customers.

Following what we have in ARP as a release
strategy where a release tag action would publish the
image to dockerhub.
go test does not have a good way to capture
unit-test coverage as part of test runs. This
commit captures the error code of the unit test
run, runs the coverage generation and then exits
based on the test status.
Helm will be used as part of the user story to
deploy the operator but it is also a good tool to
deploy the operator while developing it.

This commit adds the ability to publish helm
charts:
- to the dev registry for local iteration.
- to the regular devops registry for CI iteration
  and testing.

This will also help us test the helm chart itself.
0.0.1 is okay but it's not really a bug fix.
We should keep a single release script that will
publish docker images and helm charts for the
official release. This commit just updates the
current release script to handle both artifact
types. Helm logic will follow.
Previously, owing to a bug, if the name of the upstream included hyphens it would be rejected by the operator.
We are going to release with `0.x.y` and keeping
the internal version inline with what will get
published out will reduce confusion.
This repository is going to produce artifacts that
will be available publicly and the end users will
care about semantic versioning.

We need to be able to map a public facing version
to internally produced artifacts easily and having
semver internally eases that work.

This commit does not enforce the versioning but
adds a version file that has the semver,
which will be used to version the product.

We can follow a workflow where during release
time, we cut a release, which creates a tag and we
retag existing dev artifacts to be shipped as an
official artifact.
While a cosmetic change, it does impact how
dokerhub repo needs to be setup to publish the
helm chart. In addition to that, it impacts what
the user see on k8s itself and it should not be
nginx-loadbalancer-kubernetes as that is
confusing.
While the chartname (which was renamed in the
prior commit) gets used for naming stuff, it makes
sense to also change the name value in the chart
itself to use the new name.
Now that official images exist on docker hub, we
should use those images in our charts.
Now that the chart has been renamed, gitignore
needs to know about ignoring new charts.
This lets us test the operator from the private
registry. Also, in case a customer does not want
to pull from docker hub and instead use their own
registry, they can do so and specify a pull secret
for the image.
With the change to make the operator read a config
file (to reduce code complexity), we need to make
sure that the file exists for the service to
start.
Once the operator reads from the configmap mounted
as a volume, it does not need access to the k8s
API to read the config map itself.
This is needed temporarily while nlk still needs
to read the configmap and the code for swapping it
out with a config file does not land. The order of
merges that I am thinking:
- Get this MR landed.
- Land testenv changes to use helm.
- Land code to read settings from configfile.
- Remove configmap permissions.
This means that we do not have to grant the operator app any permissions to access kubernetes secrets. Reading from the config file changes the application's behavior in that settings are no longer changed whenever a config map is updated at run time, as they used to be. Settings are now concurrency-safe, because they pass new values instead of a shared pointer to their consumers in separate goroutines. Settings also no longer pass the application context to consumers as this is a well-document go anti-pattern. Context should always be passed as a function parameter. Any operator modules that need access to a kubernetes client are constructed with a reference to the client, instead of gaining access to the client through settings.
arussellf5 and others added 27 commits July 18, 2025 10:39
The latest versions of the kubernetes libraries recommend using a typed workqueue and this reduces a bit of boilerplate and error handling, because we no longer have to cast the workitems returned by the queue into the desired types.
Multiple parallel tests were all accessing the same pointer to a single variable for the DefaultTransport in the http package. This was leading to data races in unit tests.
There's a syntax issue in the "listen" directive. Should be "listen 9113;", not "listen 9113:". Using the current file, I got an error upon reloading the NGINX service ("nginx: [emerg] invalid port in "9113:" of the "listen" directive in /etc/nginx/conf.d/prometheus.conf:11")
There was a change in the API for the NGINX Plus Client that was missed when updating to the latest version. This corrects that.
These functions were being used to create IDs that were not really
necessary for the business logic and which were generating security
alerts because of weak cryptography techniques.
The http client is processing requests created by the nginx plus
client library, and that library should always include a sensible number
of headers. But the lack of change on the number of headers was causing
security vulnerability flags to be raised over denial of service
resource exhaustion attacks.
Go cache in the CI is seeded in the project
working directory. We should skip the mod cache
from lint/formatting as it's upstream code and
there are high chances of the linting failing as
upstream lint rules != our lint rules.
…inx hosts

In order for the nginx-hosts yaml field to be parsed correctly by viper the template needs to:

1. not put double quotes around the value (this causes viper to interpret it as a string)
2. render it as a JSON array rather than a go representation of a slice.
The biggest change here is to remove most the TLS modes to enable mTLS and self-signed certificates. Product decided that this was too complex and there was not enough user demand for most of these options. We decided to pare down the code and remove tests that were no longer well maintained. The remaining configuration allows users to toggle a single switch: whether to make the http client verify the NGINX host's certificate chain and host name if https is being used. If the user wishes to enable https with self-signed certs they can use the "skip-verify-tls" setting to allow this. The default behavior is to perform this verification.

We are maintaining the deprecated "no-tls" and "ca-tls" inputs for NGINXaaS backwards comptability reasons. The "no-tls" setting name was highly misleading, because all it did was disable TLS verification: it DID NOT disable TLS altogether in https mode. Similarly, the "ca-tls" setting did not enable TLS itself. TLS is enabled by default when the URL of the NGINX host includes the https protocol. The user setting merely enforced the verification of the certificate chain and host as described above.
Now that the plus go client allows users to check the http status code of the error, handle the upstream not found case by doing nothing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants