This is a post about Nginx’s DNS resolution behavior I didn’t know about but wish I did before I started using Kubernetes (K8s).
Nginx caches statically configured domains once
I moved a backend service
foo from running on a virtual
machine to K8s. Foo’s clients include an Nginx instance configured with this
1 2 3 4 5 6 7 8 9 10 11 12
K8s Pods can be rescheduled anytime so their IPs aren’t stable. I’m supposed to use K8s Services
to avoid caching these ephemeral Pod IPs. But in my case because of interoperability reasons I was
registering Pod IPs directly as A records for
foo.example.com.. I started noticing that after my Pod
IPs changed either because of rescheduling or updating the Deployment, Nginx started throwing
502 Bad Gateway errors.
Nginx resolves statically configured domain names only once at startup or configuration
reload time. So Nginx resolved
foo.example.com. once at startup to several Pod IPs and cached
Using a variable for the domain name will make Nginx resolve and cache it using the TTL value of the
DNS response. So replace the
upstream block with a variable. I have no idea why it has to be a
variable to make Nginx resolve the domain periodically.
And replace the
proxy_pass line with
1 2 3 4
This behavior isn’t documented but has been observed empirically and discussed here, here,
and here. I also learned that this setup requires me to define a
resolver in the Nginx configs.
For some reason Nginx resolves statically configured domains by querying the nameserver specified in
/etc/resolv.conf but periodically resolved domains require a completely different config
setting. I would love to know why.
The VM on which Nginx was running ran a Bind DNS server locally, so I set
I triggered the code path that made Nginx send requests to foo and saw periodic DNS queries
sudo tcpdump -i lo -n dst port 53 | grep foo.
What if that Nginx is also running on K8s?
I had another Nginx instance that also made requests to foo. This Nginx was running on K8s too. It was created with this Deployment YAML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
nginx-config ConfigMap was
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
upstream with the same pattern above, but in this case when I needed to define
resolver I couldn’t use
127.0.0.1 because there’s no Bind running locally. I can’t hardcode the
resolver because it might change.
Solution: run Nginx and foo on the same K8s cluster and use the cluster-local Service DNS record
If Nginx and foo run on the same K8s cluster, I can use the cluster-local DNS record created by a K8s Service matching the foo Pods. A Service like this
1 2 3 4 5 6
will create a DNS A record
foo.bar.svc.cluster.local. pointing to the K8s Service’s IP.
Since this Service’s IP is stable and it load balances requests to the underlying Pods, there’s no need for Nginx to
periodically lookup the Pod IPs. I can keep the
upstream block like so.
1 2 3
As its name implies,
foo.bar.svc.cluster.local. is only resolvable within the cluster. So
Nginx has to be running on the same cluster as foo.
Solution: dynamically set the Nginx
resolver equal to the system’s when the Pod starts
What if Nginx is on another K8s cluster? Then I can set
resolver to the IP of one of the
/etc/resolv.conf. After a bunch of tinkering I came up with this way to dynamically
set the Nginx
resolver when the Pod starts. A placeholder for
resolver is set in the Nginx
ConfigMap, and a command at Pod startup copies over the templated config and replaces the
placeholder with a nameserver IP from
nginx-config ConfigMap to
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Deployment YAML then becomes (note the added
args, and new
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
volume of type
emptyDir is needed because recent versions of K8s made configMap volumes
read-only. EmptyDir types are writable.
Hopefully this helps some people out there who don’t want to spend as much time as I did Googling obscure Nginx behavior.