More About Nginx DNS Resolution Than You Ever Wanted to Know

|

This is a post about Nginx’s DNS resolution behavior I didn’t know about but wish I did before I started using Kubernetes (K8s).

Nginx caches statically configured domains once

Symptoms

I moved a backend service foo from running on a virtual machine to K8s. Foo’s clients include an Nginx instance configured with this upstream block.

1
2
3
4
5
6
7
8
9
10
11
12
upstream foo {
  server foo.example.com.;
}

server {
  ...

  location ~* /_foo/(.*) {
    proxy_pass https://foo/$1;
    ...
  }
}

K8s Pods can be rescheduled anytime so their IPs aren’t stable. I’m supposed to use K8s Services to avoid caching these ephemeral Pod IPs. But in my case because of interoperability reasons I was registering Pod IPs directly as A records for foo.example.com.. I started noticing that after my Pod IPs changed either because of rescheduling or updating the Deployment, Nginx started throwing 502 Bad Gateway errors.

Root Problem

Nginx resolves statically configured domain names only once at startup or configuration reload time. So Nginx resolved foo.example.com. once at startup to several Pod IPs and cached them forever.

Solution

Using a variable for the domain name will make Nginx resolve and cache it using the TTL value of the DNS response. So replace the upstream block with a variable. I have no idea why it has to be a variable to make Nginx resolve the domain periodically.

1
set $foo_url foo.example.com.;

And replace the proxy_pass line with

1
2
3
4
  location ~* /_foo/(.*) {
    proxy_pass https://$foo_url/$1;
    ...
  }

This behavior isn’t documented but has been observed empirically and discussed here, here, and here. I also learned that this setup requires me to define a resolver in the Nginx configs. For some reason Nginx resolves statically configured domains by querying the nameserver specified in /etc/resolv.conf but periodically resolved domains require a completely different config setting. I would love to know why.

The VM on which Nginx was running ran a Bind DNS server locally, so I set resolver 127.0.0.1. I triggered the code path that made Nginx send requests to foo and saw periodic DNS queries occurring with sudo tcpdump -i lo -n dst port 53 | grep foo.

What if that Nginx is also running on K8s?

Problem

I had another Nginx instance that also made requests to foo. This Nginx was running on K8s too. It was created with this Deployment YAML.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: openresty/openresty:trusty
        ports:
          - name: https
            containerPort: 443
            protocol: TCP
        volumeMounts:
          - name: nginx-config
            mountPath: /etc/nginx/conf.d
      volumes:
        - name: nginx-config
          configMap:
            name: nginx-config

The nginx-config ConfigMap was

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
data:
  nginx.conf: |
    upstream foo {
      server foo.example.com.:443;
    }

    server {
      ...

      # use regex capture to preserve url path and query params
      location ~* /_foo/(.*) {
        proxy_pass https://foo/$1;
        ...
      }
    }

I replaced upstream with the same pattern above, but in this case when I needed to define resolver I couldn’t use 127.0.0.1 because there’s no Bind running locally. I can’t hardcode the resolver because it might change.

Solution: run Nginx and foo on the same K8s cluster and use the cluster-local Service DNS record

If Nginx and foo run on the same K8s cluster, I can use the cluster-local DNS record created by a K8s Service matching the foo Pods. A Service like this

1
2
3
4
5
6
apiVersion: v1
kind: Service
metadata:
  name: foo
  namespace: bar
...

will create a DNS A record foo.bar.svc.cluster.local. pointing to the K8s Service’s IP. Since this Service’s IP is stable and it load balances requests to the underlying Pods, there’s no need for Nginx to periodically lookup the Pod IPs. I can keep the upstream block like so.

1
2
3
upstream foo {
  server foo.bar.svc.cluster.local.:443;
}

As its name implies, foo.bar.svc.cluster.local. is only resolvable within the cluster. So Nginx has to be running on the same cluster as foo.

Solution: dynamically set the Nginx resolver equal to the system’s when the Pod starts

What if Nginx is on another K8s cluster? Then I can set resolver to the IP of one of the nameservers in /etc/resolv.conf. After a bunch of tinkering I came up with this way to dynamically set the Nginx resolver when the Pod starts. A placeholder for resolver is set in the Nginx ConfigMap, and a command at Pod startup copies over the templated config and replaces the placeholder with a nameserver IP from /etc/resolv.conf.

Change nginx-config ConfigMap to

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
data:
  nginx.conf.template: |
    server {
      ...

      # This directive is dynamic because we set it to the
      # kube-dns Service IP which is different for each cluster.
      resolver $NAMESERVER;

      set $foo_url foo.example.com.;

      # use regex capture to preserve url path and query params
      location ~* /_foo/(.*) {
        proxy_pass https://$foo_url/$1;
        ...
      }
    }

Deployment YAML then becomes (note the added command, args, and new volume and volumeMount).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: openresty/openresty:trusty
        command: ['/bin/bash', '-c']
        args:
        - |
          export NAMESERVER=$(grep 'nameserver' /etc/resolv.conf | awk '{print $2}' | tr '\n' ' ')
          echo "Nameserver is: $NAMESERVER"
          echo 'Copying nginx config'
          envsubst '$NAMESERVER' < /etc/nginx/conf.d.template/nginx.conf.template > /etc/nginx/conf.d/nginx.conf
          echo 'Using nginx config:'
          cat /etc/nginx/conf.d/nginx.conf
          echo 'Starting nginx'
          nginx -g 'daemon off;'
        ports:
          - name: https
            containerPort: 443
            protocol: TCP
        volumeMounts:
          - name: nginx-config-template
            mountPath: /etc/nginx/conf.d.template
          - name: nginx-config
            mountPath: /etc/nginx/conf.d
      volumes:
        - name: nginx-config
          emptyDir: {}
        - name: nginx-config-template
          configMap:
            name: nginx-config

A volume of type emptyDir is needed because recent versions of K8s made configMap volumes read-only. EmptyDir types are writable.

Hopefully this helps some people out there who don’t want to spend as much time as I did Googling obscure Nginx behavior.

Comments