What I Recently Learned About Docker Networking and Debugging Networking Issues in General

|

This is a story about how debugged a confounding local development environment issue, what I learned about Docker in the process, and the generally applicable debugging strategies and techniques that helped me ultimately solve it. Skip to the end if you only want to read the debugging strategies and techniques. The overall story, however, will illustrate how they applied in this specific case.

Problem Statement and Use Case

A data infrastructure team at work provides a tool for starting a data pipeline job from a local development environment. Let’s call this tool foo. This tool depends on gcloud and docker. It creates a user-defined Docker network, runs a utility container called bar connected to that network, and then runs another container called qux that talks to bar to retrieve Oauth tokens from Google Cloud Platform (GCP).

Most developers run foo on their local workstations, e.g. Macbooks. But I have the newer Macbook with the Apple M1 ARM-based chip. Docker Desktop on Mac support for M1s was relatively recent. I didn’t want deal with Docker weirdness. I also didn’t have a lot of free disk space on my 256GB Macbook and thus didn’t feel like clogging up my drive with lots of Java, Scala, and Docker gunk.

So I tried running foo on a GCE VM configured by our Puppet configuration files. I ran foo, I got this error.

Error #1: inter-container networking failed between containers attached to user-defined Docker networks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
dxia@my-host$ foo --verbose run -f data-info.yaml -w DumpKubernetesContainerImagesJob -p 2021-04-26 -r my-project/target/image-name
DEBUG:verify: Docker network `foo-network` already exists
DEBUG:verify: bar container found.
INFO:run: starting workflow DumpKubernetesContainerImagesJob ...
ERROR: (gcloud.auth.activate-service-account) [Errno 110] Connection timed out
This may be due to network connectivity issues. Please check your network settings, and the status of the service you are trying to reach.
Traceback (most recent call last):
  File "/usr/local/bin/activate-google-application-credentials", line 19, in <module>
    'auth', 'activate-service-account', '--key-file', json_path])
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'auth', 'activate-service-account', '--key-file', '/etc/_foo/gcp-sa-key.json']' returned non-zero exit status 1.
ERROR:foo:

  RAN: /usr/bin/docker run -it -v /home/dxia/my-project/_foo:/etc/_foo:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python2.7/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.6/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.7/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.8/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.9/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python2.7/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.6/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.7/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.8/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.9/site-packages/oauth2client/__init__.py:ro --net foo-network -e FOO_COMPONENT_ID=my-project -e FOO_WORKFLOW_ID=DumpKubernetesContainerImagesJob -e FOO_PARAMETER=2021-04-26 -e FOO_DOCKER_IMAGE=my-project:20210426T211411-2b5452d -e 'FOO_DOCKER_ARGS=wrap-luigi --module luigi_tasks DumpKubernetesContainerImagesJob --when 2021-04-26' -e FOO_EXECUTION_ID=foorun-2e30c385-2f89-494c-bc0e-97b3eff316d5 -e FOO_TRIGGER_ID=foo-942f155b-49eb-4af8-a6e4-3adf6f72577b -e FOO_TRIGGER_TYPE=foo -e FOO_ENVIRONMENT=foo -e FOO_LOGGING=text -e GOOGLE_APPLICATION_CREDENTIALS=/etc/_foo/gcp-sa-key.json -e FOO_SERVICE_ACCOUNT=dump-k8-deployment-info-pipeli@my-project.iam.gserviceaccount.com gcr.io/xpn-1/my-project:20210426T211411-2b5452d wrap-luigi --module luigi_tasks DumpKubernetesContainerImagesJob --when 2021-04-26

  STDOUT:


  STDERR:
Traceback (most recent call last):
  File "/home/dxia/my-project//lib/python3.6/site-packages/foo/foo.py", line 304, in main
    args.func(args)
  File "/home/dxia/my-project//lib/python3.6/site-packages/foo/foo.py", line 269, in _run
    args.declarative_infra,
  File "/home/dxia/my-project//lib/python3.6/site-packages/foo/run.py", line 641, in run_workflow
    declarative_infra,
  File "/home/dxia/my-project//lib/python3.6/site-packages/foo/run.py", line 161, in _run_workflow
    p.wait()
  File "/home/dxia/my-project//lib/python3.6/site-packages/sh.py", line 841, in wait
    self.handle_command_exit_code(exit_code)
  File "/home/dxia/my-project//lib/python3.6/site-packages/sh.py", line 865, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1:

  RAN: /usr/bin/docker run -it -v /home/dxia/my-project/_foo:/etc/_foo:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python2.7/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.6/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.7/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.8/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.9/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python2.7/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.6/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.7/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.8/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.9/site-packages/oauth2client/__init__.py:ro --net foo-network -e FOO_COMPONENT_ID=my-project -e FOO_WORKFLOW_ID=DumpKubernetesContainerImagesJob -e FOO_PARAMETER=2021-04-26 -e FOO_DOCKER_IMAGE=my-project:20210426T211411-2b5452d -e 'FOO_DOCKER_ARGS=wrap-luigi --module luigi_tasks DumpKubernetesContainerImagesJob --when 2021-04-26' -e FOO_EXECUTION_ID=foorun-2e30c385-2f89-494c-bc0e-97b3eff316d5 -e FOO_TRIGGER_ID=foo-942f155b-49eb-4af8-a6e4-3adf6f72577b -e FOO_TRIGGER_TYPE=foo -e FOO_ENVIRONMENT=foo -e FOO_LOGGING=text -e GOOGLE_APPLICATION_CREDENTIALS=/etc/_foo/gcp-sa-key.json -e FOO_SERVICE_ACCOUNT=dump-k8-deployment-info-pipeli@my-project.iam.gserviceaccount.com gcr.io/xpn-1/my-project:20210426T211411-2b5452d wrap-luigi --module luigi_tasks DumpKubernetesContainerImagesJob --when 2021-04-26

  STDOUT:


  STDERR:

The HTTP connection timed out. First I checked whether the container started by foo can make a TCP connection to the bar container. I ran foo --verbose run -f data-info.yaml -w DumpKubernetesContainerImagesJob -p 2021-04-26 -r my-project/target/image-name again and did the following in another terminal window.

nsenter is a cool tool that allows you to run programs in different Linux namespaces. It’s very useful when you can’t get an executable shell into a container with commands like docker exec -it ... bash. This can happen when the container doesn’t even include any shells and just has the binary executable for instance.

1
2
3
4
5
6
7
dxia@my-host:~$ docker ps
CONTAINER ID        IMAGE                                                                    COMMAND                  CREATED             STATUS              PORTS                     NAMES
a0e872188831        my-project:20210426T211411-2b5452d         "/usr/local/bin/edge…"   2 seconds ago       Up 1 second                                   relaxed_pike
4dda670a2ee1        foo/bar:latest                                   "./bar"               2 hours ago         Up 2 hours          0.0.0.0:80->80/tcp        bar

dxia@my-host:~$ sudo nsenter -n -t $(docker inspect --format  a0e872188831)  nc 172.20.0.127 80 -nvz -w 5
(UNKNOWN) [172.20.0.127] 80 (http) : Connection timed out

So the HTTP connection timeout was caused by an error lower down on the networking stack: an inability to establish a TCP connection. A TCP connection from the host to bar worked though.

1
2
dxia@my-host:~$ nc 172.20.0.127 80 -nvz -w 5
(UNKNOWN) [172.20.0.127] 80 (http) open

When I see a networking issue like this, I know there might be some misconfigured firewall rule blocking IP packets. I listed all the firewall rules. The ones in the filter table’s FORWARD chain caught my attention.

1
2
3
4
5
6
7
8
9
10
11
dxia@my-host:~$ sudo iptables --list FORWARD --verbose --numeric --line-numbers --table filter
Chain FORWARD (policy DROP 38 packets, 2280 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1     6204  492K DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
2     6204  492K DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3     3080  323K ACCEPT     all  --  *      corp0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
4        1    60 DOCKER     all  --  *      corp0  0.0.0.0/0            0.0.0.0/0
5     3085  167K ACCEPT     all  --  corp0 !corp0  0.0.0.0/0            0.0.0.0/0
6        0     0 ACCEPT     all  --  corp0 corp0  0.0.0.0/0            0.0.0.0/0
7      264 17722 DOCKER     all  --  *      br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0
8     7382   17M ACCEPT     all  --  br-8ce7e363e4f9 !br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0

I disabled the GCE VM’s cronned Puppet run and then ran sudo systemctl restart docker. I ran bar and a test nginx1 container connected to foo-network.

1
2
3
4
5
6
7
8
9
10
dxia@my-host:~$ docker run --rm -d -v ~/.config/gcloud/:/.config/gcloud --name bar --net foo-network --ip 172.20.0.127 -p 80:80 foo/bar:latest
3f9cc17b3f71e7056fd8072449afa78eb9a6a166ac091d751b69545ead0438b1

dxia@my-host:~$ docker run --net foo-network --name nginx1 -d -p 8080:80 nginx:latest
1b0b2b981f9389a989aa8f60a141b5e9a18ba5582141b6668c9078b6312dcfaf

dxia@my-host:~$ docker ps
CONTAINER ID        IMAGE                                                                    COMMAND                  CREATED              STATUS              PORTS                     NAMES
1b0b2b981f93        nginx:latest                                                             "/docker-entrypoint.…"   5 seconds ago        Up 3 seconds        0.0.0.0:8080->80/tcp      nginx1
3f9cc17b3f71        foo/bar:latest                                   "./bar"               About a minute ago   Up 59 seconds       0.0.0.0:80->80/tcp        bar

Now a TCP connection from the nginx container to bar succeeded.

1
2
dxia@my-host:~$ sudo nsenter --net=$(docker inspect --format  nginx1) nc 172.20.0.127 80 -nvz -w 5
(UNKNOWN) [172.20.0.127] 80 (http) open

I checked iptables rules again and saw two additional rules (7 and 8) in the filter table’s FORWARD chain. Rule 8 allowed IP packets coming in from the br-8ce7e363e4f9 network interface (in this case a Linux bridge) and leaving through the same interface.

1
2
3
4
5
6
7
8
9
10
11
12
13
dxia@my-host:~$ sudo iptables --list FORWARD --verbose --numeric --line-numbers --table filter
Chain FORWARD (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
2        0     0 DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3        0     0 ACCEPT     all  --  *      corp0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
4        0     0 DOCKER     all  --  *      corp0  0.0.0.0/0            0.0.0.0/0
5        0     0 ACCEPT     all  --  corp0 !corp0  0.0.0.0/0            0.0.0.0/0
6        0     0 ACCEPT     all  --  corp0 corp0  0.0.0.0/0            0.0.0.0/0
7        0     0 ACCEPT     all  --  *      br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
8        0     0 ACCEPT     all  --  br-8ce7e363e4f9 br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0
9      264 17722 DOCKER     all  --  *      br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0
10    7382   17M ACCEPT     all  --  br-8ce7e363e4f9 !br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0

When I re-ran Puppet rules 7 and 8 were deleted and containers on the foo-network were again unable to establish a TCP connection. I added rule 8 manually and confirmed this is the rule causing my error above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
dxia@my-host:~$ sudo iptables --table filter --append FORWARD --in-interface br-8ce7e363e4f9 --out-interface br-8ce7e363e4f9 --source 0.0.0.0/0 --destination 0.0.0.0/0 --jump ACCEPT

dxia@my-host:~$ sudo iptables --list FORWARD --verbose --numeric --line-numbers --table filter
Chain FORWARD (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1    23526 1377K DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
2    23526 1377K DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3    11728  755K ACCEPT     all  --  *      corp0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
4        1    60 DOCKER     all  --  *      corp0  0.0.0.0/0            0.0.0.0/0
5    11737  617K ACCEPT     all  --  corp0 !corp0  0.0.0.0/0            0.0.0.0/0
6        0     0 ACCEPT     all  --  corp0 corp0  0.0.0.0/0            0.0.0.0/0
7      182 12970 DOCKER     all  --  *      br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0
8       39  2748 ACCEPT     all  --  br-8ce7e363e4f9 !br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0
9        0     0 ACCEPT     all  --  br-8ce7e363e4f9 br-8ce7e363e4f9  0.0.0.0/0            0.0.0.0/0

dxia@my-host:~$ sudo nsenter --net=$(docker inspect --format  nginx1) nc 172.20.0.127 80 -nvz -w 5
(UNKNOWN) [172.20.0.127] 80 (http) open

Now running foo gave a different error.

Error #2: DNS queries for external records from bar failed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
dxia@my-host$ foo run -f data-info.yaml -w DumpKubernetesContainerImagesJob -p 2021-04-26 -r my-project/target/image-name
INFO:run: starting workflow DumpKubernetesContainerImagesJob ...
ERROR: (gcloud.auth.activate-service-account) There was a problem refreshing your current auth tokens: Invalid response 500.
Please run:

  $ gcloud auth login

to obtain new credentials.

If you have already logged in with a different account:

    $ gcloud config set account ACCOUNT

to select an already authenticated account to use.
Traceback (most recent call last):
  File "/usr/local/bin/activate-google-application-credentials", line 19, in <module>
    'auth', 'activate-service-account', '--key-file', json_path])
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'auth', 'activate-service-account', '--key-file', '/etc/_foo/gcp-sa-key.json']' returned non-zero exit status 1.
ERROR:foo: non-zero exit code (1) from `/usr/bin/docker run -it -v /home/dxia/my-project/_foo:/etc/_foo:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python2.7/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.6/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.7/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.8/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.9/dist-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python2.7/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.6/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.7/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.8/site-packages/oauth2client/__init__.py:ro -v /home/dxia/my-project/_foo/__init__.py:/usr/local/lib/python3.9/site-packages/oauth2client/__init__.py:ro --net foo-network -e FOO_COMPONENT_ID=my-project -e FOO_WORKFLOW_ID=DumpKubernetesContainerImagesJob -e FOO_PARAMETER=2021-04-02 -e FOO_DOCKER_IMAGE=my-project:20210422T065801-2b5452d -e 'FOO_DOCKER_ARGS=wrap-luigi --module luigi_tasks DumpKubernetesContainerImagesJob --when 2021-04-02' -e FOO_EXECUTION_ID=foorun-3feaee45-35e4-4c01-9430-86de52eb2db1 -e FOO_TRIGGER_ID=foo-f721de7f-edf9-4bb3-8cdf-1e9bbcec5035 -e FOO_TRIGGER_TYPE=foo -e FOO_ENVIRONMENT=foo -e FOO_LOGGING=text -e GOOGLE_APPLICATION_CREDENTIALS=/etc/_foo/gcp-sa-key.json -e FOO_SERVICE_ACCOUNT=dump-k8-deployment-info-pipeli@my-project.iam.gserviceaccount.com gcr.io/xpn-1/my-project:20210422T065801-2b5452d wrap-luigi --module luigi_tasks DumpKubernetesContainerImagesJob --when 2021-04-02`

The only background knowledge we need to know here is that the qux container is sending a Google Service Account (GSA) JSON credential with "token_uri": "http://172.20.0.127:80/token". Bar then uses that token for further GCP API requests. So bar needs to query DNS for accounts.google.com. Bar container logs show that it cannot lookup the DNS A record for accounts.google.com by querying 127.0.0.11:53.

1
2
3
4
5
6
7
8
9
10
11
dxia@my-host:~$ docker ps
CONTAINER ID        IMAGE                                                                    COMMAND                  CREATED             STATUS              PORTS                     NAMES
1b0b2b981f93        nginx:latest                                                             "/docker-entrypoint.…"   9 hours ago         Up 9 hours          0.0.0.0:8080->80/tcp      nginx1
3f9cc17b3f71        foo/bar:latest                                   "./bar"               9 hours ago         Up 9 hours          0.0.0.0:80->80/tcp        bar

dxia@my-host:~$ docker logs --follow bar
2021/04/22 05:54:19 bar started
2021/04/22 06:59:13 Received JWT assertion: [REDACTED base-64 string]
2021/04/22 06:59:13 Servive account name:  projects/-/serviceAccounts/dump-k8-deployment-info-pipeli@my-project.iam.gserviceaccount.com
2021/04/22 06:59:28 Post https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/dump-k8-deployment-info-pipeli@my-project.iam.gserviceaccount.com:generateAccessToken?alt=json&prettyPrint=false: Post https://accounts.google.com/o/oauth2/token: dial tcp: lookup accounts.google.com on 127.0.0.11:53: read udp 127.0.0.1:46920->127.0.0.11:53: i/o timeout
2021/04/22 06:59:28 Failed to create new token for dump-k8-deployment-info-pipeli@my-project.iam.gserviceaccount.com

I wondered why bar was querying 127.0.0.11 for DNS. It turns out this is another loopback address. In fact, all of 127.0.0.0/8 is loopback according to RFC-6890. I guess Docker containers that are attached to user-defined Docker networks are configured by default to use 127.0.0.11 in their /etc/resolv.conf.

1
2
3
4
5
6
7
8
9
10
dxia@my-host$ docker ps
CONTAINER ID        IMAGE                                                                    COMMAND                  CREATED             STATUS              PORTS                     NAMES
1b0b2b981f93        nginx:latest                                                             "/docker-entrypoint.…"   9 hours ago         Up 9 hours          0.0.0.0:8080->80/tcp      nginx1
3f9cc17b3f71        foo/bar:latest                                   "./bar"               9 hours ago         Up 9 hours          0.0.0.0:80->80/tcp        bar

dxia@my-host$ docker exec -it nginx1 /bin/sh -c "cat /etc/resolv.conf"

search corp.net
nameserver 127.0.0.11
options attempts:1 timeout:5 ndots:0

Why were these Docker containers configured to query for DNS records on 127.0.0.11? It turned out after some Googling that

By default, a container inherits the DNS settings of the host, as defined in the /etc/resolv.conf configuration file. Containers that use the default bridge network get a copy of this file, whereas containers that use a custom network use Docker’s embedded DNS server, which forwards external DNS lookups to the DNS servers configured on the host.

https://docs.docker.com/config/containers/container-networking/

Now I wondered if Docker’s embedded DNS server is actually running. After some more Googling, I realized that each container also had its own set of firewall rules. So I listed bar’s nat table’s DOCKER_OUTPUT chain’s rules. These two rules showed that the destination port is changed for TCP packets bound for 127.0.0.11:53 to 37619. UDP packets have their port changed to 58552.

1
2
3
4
5
6
dxia@my-host$ sudo nsenter -n -t $(docker inspect --format  bar) sudo iptables --list DOCKER_OUTPUT --verbose --numeric --line-numbers --table nat

Chain DOCKER_OUTPUT (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            127.0.0.11           tcp dpt:53 to:127.0.0.11:37619
    0     0 DNAT       udp  --  *      *       0.0.0.0/0            127.0.0.11           udp dpt:53 to:127.0.0.11:58552

Whatever’s listening on those ports was accepting TCP and UDP connections.

1
2
3
4
dxia@my-host$ sudo nsenter -n -t $(docker inspect --format  bar) nc 127.0.0.11 58552 -nvzu -w 5
(UNKNOWN) [127.0.0.11] 58552 (?) open
dxia@my-host$ sudo nsenter -n -t $(docker inspect --format  bar) nc 127.0.0.11 37619 -nvz -w 5
(UNKNOWN) [127.0.0.11] 37619 (?) open

But there was no DNS reply from either.

1
2
3
4
5
6
7
8
9
10
11
12
13
dxia@my-host$ sudo nsenter -n -t $(docker inspect --format  bar) dig @127.0.0.11 -p 58552 accounts.google.com

; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> @127.0.0.11 -p 58552 accounts.google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

dxia@my-host$ sudo nsenter -n -t $(docker inspect --format  bar) dig @127.0.0.11 -p 37619 accounts.google.com +tcp

; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> @127.0.0.11 -p 37619 accounts.google.com +tcp
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Docker daemon was listening for DNS queries at that IP and port from within bar.

1
2
3
4
5
dxia@my-host$ sudo nsenter -n -p -t $(docker inspect --format  bar) ss -utnlp
Netid        State          Recv-Q         Send-Q                    Local Address:Port                    Peer Address:Port
udp          UNCONN         0              0                            127.0.0.11:58552                        0.0.0.0:*             users:(("dockerd",pid=10984,fd=38))
tcp          LISTEN         0              128                          127.0.0.11:37619                        0.0.0.0:*             users:(("dockerd",pid=10984,fd=40))
tcp          LISTEN         0              128                                   *:80                                 *:*             users:(("bar",pid=12150,fd=3))

After enabling log-level": "debug" in /etc/docker/daemon.json and reloading the configuration file, I saw that the daemon was trying to forward the DNS query to 10.99.0.1. This was the IP of the corp0 bridge network interface which we create instead of the default docker0 bridge network. I saw there was an IO timeout when the daemon was waiting for the DNS reply.

1
2
3
4
5
dxia@my-host$ sudo journalctl --follow -u docker
-- Logs begin at Tue 2019-11-05 18:17:27 UTC. --
Apr 22 15:43:12 my-host.corp.net dockerd[10984]: time="2021-04-22T15:43:12.496979903Z" level=debug msg="[resolver] read from DNS server failed, read udp 172.20.0.127:37928->10.99.0.1:53: i/o timeout"
Apr 22 15:43:13 my-host.corp.net dockerd[10984]: time="2021-04-22T15:43:13.496539033Z" level=debug msg="Name To resolve: accounts.google.com."
Apr 22 15:43:13 my-host.corp.net dockerd[10984]: time="2021-04-22T15:43:13.496958664Z" level=debug msg="[resolver] query accounts.google.com. (A) from 172.20.0.127:51642, forwarding to udp:10.99.0.1"

We set dockerd’s upstream DNS server as 10.99.0.1 because we have unbound running as a DNS proxy/cache on the host. We configured it to bind on the bridge interface so Docker containers can hit the host-local unbound instance by routing DNS requests to corp0.

So why can’t the daemon forward IP packets from 172.20.0.127:37928 to 10.99.0.1:53? It seemed like UDP packets sent from bar were able to reach 10.99.0.1:53, but DNS requests failed. I also knew DNS requests from the host to 10.99.0.1:53 worked.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
dxia@my-host:~$ sudo nsenter -n -t $(docker inspect --format  bar) nc 10.99.0.1 53 -nvzu -w 5
(UNKNOWN) [10.99.0.1] 53 (domain) open

dxia@my-host:~$ sudo nsenter -n -t $(docker inspect --format  bar) dig @10.99.0.1 accounts.google.com

; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> @10.99.0.1 accounts.google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

dxia@my-host:~$ dig @10.99.0.1 accounts.google.com

; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> @10.99.0.1 accounts.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39308
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;accounts.google.com.     IN  A

;; ANSWER SECTION:
accounts.google.com.  108 IN  A   142.250.31.84

;; Query time: 1 msec
;; SERVER: 10.99.0.1#53(10.99.0.1)
;; WHEN: Mon Apr 26 23:58:28 UTC 2021
;; MSG SIZE  rcvd: 64

My hypothesis at this point was that Docker’s embedded DNS server wasn’t working in some way. After exploring this for a while with no luck, I questioned my assumption that UDP packets from 172.20.0.127:37928 were able to reach 10.99.0.1:53. I realized TCP packets from 172.20.0.127:37928 were not able to reach 10.99.0.1:53.

1
2
dxia@my-host:~$ sudo nsenter -n -t $(docker inspect --format  bar) nc 10.99.0.1 53 -nvz -w 5
(UNKNOWN) [10.99.0.1] 53 (domain) : Connection timed out

So why were UDP packets able to? Isn’t UDP a fire-and-forget protocol? How can nc even tell if an IP and port is listening for UDP packets at all? It was good that I backtracked and questioned my assumption because it turns out that one cannot distinguish between an open UDP port and dropped packets en route to that port.

So it must be another networking issue which means there must be another firewall rule that’s blocking packets from the bar container to 10.99.0.1. After a while of looking, I realized the filter table’s INPUT chain’s default policy was DROP and that there was no rule that matched packets coming in from the br-8ce7e363e4f9 interface.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
dxia@my-host:~$ sudo iptables --list INPUT --verbose --numeric --line-numbers --table filter
Chain INPUT (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1     434M  345G            all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 00 ACC-eth0 input */
2        0     0            all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            /* 00 ACC-eth1 input */
3     7080  283K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 000 drop invalid */ ctstate INVALID
4    1907M  568G ipthrouput  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 000 track forward */
5     987M  414G ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 001 accept established */ ctstate RELATED,ESTABLISHED
6     763M   95G ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0            /* 002 allow local */
7        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 003 accept ipsec */ policy match dir in pol ipsec
8        1    28 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            /* 004 allow icmp */ icmptype 8 limit: avg 10/sec burst 5
9        0     0 DROP       icmp --  *      *       0.0.0.0/0            0.0.0.0/0            /* 005 drop icmp */
10       0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            multiport dports 9203 /* 006 block JMX on service net */
11       0     0 REJECT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 9203 /* 020 deny direct access to JMX port on helios nodes */ reject-with icmp-port-unreachable
12       0     0 DROP       all  --  eth0   *       10.48.64.0/22        0.0.0.0/0            /* 08 drop traffic from osxenv 10.48.64.0/22 */
13       0     0 DROP       all  --  eth0   *       10.97.16.0/21        0.0.0.0/0            /* 08 drop traffic from osxenv 10.97.16.0/21 */
14       0     0 DROP       all  --  eth0   *       172.24.32.0/22       0.0.0.0/0            /* 08 drop traffic from windowsbuildagentsenv 172.24.32.0/22 */
15       0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 08 prevent access from buildagent machines */ match-set buildagent src
16       0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 08 prevent access from paymentbamboo machines */ match-set paymentbamboo src
17   18168 1094K ACCEPT     all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 09 allow service nets */ match-set service_nets src
18     134  8496 ACCEPT     all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 09 allow tech offices */ match-set tech_offices src
19       0     0 ACCEPT     all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 09 allow testenv */ match-set testenv src
20       0     0 ACCEPT     all  --  eth0   *       130.211.0.0/22       0.0.0.0/0            /* 10 Google networks for 130.211.0.0/22 */
21       0     0 ACCEPT     all  --  eth0   *       35.191.0.0/16        0.0.0.0/0            /* 10 Google networks for 35.191.0.0/16 */
22       0     0 LOG        tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 11 inbound eth0: */ limit: avg 3/min burst 5 LOG flags 0 level 4 prefix "input eth0: "
23       0     0 LOG        tcp  --  eth1   *       0.0.0.0/0            0.0.0.0/0            /* 11 inbound eth1: */ limit: avg 3/min burst 5 LOG flags 0 level 4 prefix "input eth1: "
24       0     0 ACCEPT     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0            /* 20 allow all docker0 traffic */
25    158M   59G ACCEPT     all  --  corp0 *       0.0.0.0/0            0.0.0.0/0            /* 20 allow all corp0 traffic */
26       0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            multiport dports 22 /* 500 allow ssh on service net */ ctstate NEW recent: SET name: DEFAULT side: source mask: 255.255.255.255
27       0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            multiport dports 22 /* 501 limit ssh on service net */ ctstate NEW recent: UPDATE seconds: 180 hit_count: 20 name: DEFAULT side: source mask: 255.255.255.255

So I added a matching rule that accepted those packets manually.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
sudo iptables --table filter --append INPUT --in-interface br-8ce7e363e4f9 --source 0.0.0.0/0 --destination 0.0.0.0/0 --jump ACCEPT

dxia@my-host:~$ sudo iptables --list INPUT --verbose --numeric --line-numbers --table filter
Chain INPUT (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1     434M  345G            all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 00 ACC-eth0 input */
2        0     0            all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            /* 00 ACC-eth1 input */
3     7080  283K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 000 drop invalid */ ctstate INVALID
4    1908M  568G ipthrouput  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 000 track forward */
5     987M  414G ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 001 accept established */ ctstate RELATED,ESTABLISHED
6     763M   95G ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0            /* 002 allow local */
7        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 003 accept ipsec */ policy match dir in pol ipsec
8        1    28 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            /* 004 allow icmp */ icmptype 8 limit: avg 10/sec burst 5
9        0     0 DROP       icmp --  *      *       0.0.0.0/0            0.0.0.0/0            /* 005 drop icmp */
10       0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            multiport dports 9203 /* 006 block JMX on service net */
11       0     0 REJECT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 9203 /* 020 deny direct access to JMX port on helios nodes */ reject-with icmp-port-unreachable
12       0     0 DROP       all  --  eth0   *       10.48.64.0/22        0.0.0.0/0            /* 08 drop traffic from osxenv 10.48.64.0/22 */
13       0     0 DROP       all  --  eth0   *       10.97.16.0/21        0.0.0.0/0            /* 08 drop traffic from osxenv 10.97.16.0/21 */
14       0     0 DROP       all  --  eth0   *       172.24.32.0/22       0.0.0.0/0            /* 08 drop traffic from windowsbuildagentsenv 172.24.32.0/22 */
15       0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 08 prevent access from buildagent machines */ match-set buildagent src
16       0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* 08 prevent access from paymentbamboo machines */ match-set paymentbamboo src
17   18168 1094K ACCEPT     all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 09 allow service nets */ match-set service_nets src
18     134  8496 ACCEPT     all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 09 allow tech offices */ match-set tech_offices src
19       0     0 ACCEPT     all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 09 allow testenv */ match-set testenv src
20       0     0 ACCEPT     all  --  eth0   *       130.211.0.0/22       0.0.0.0/0            /* 10 Google networks for 130.211.0.0/22 */
21       0     0 ACCEPT     all  --  eth0   *       35.191.0.0/16        0.0.0.0/0            /* 10 Google networks for 35.191.0.0/16 */
22       0     0 LOG        tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 11 inbound eth0: */ limit: avg 3/min burst 5 LOG flags 0 level 4 prefix "input eth0: "
23       0     0 LOG        tcp  --  eth1   *       0.0.0.0/0            0.0.0.0/0            /* 11 inbound eth1: */ limit: avg 3/min burst 5 LOG flags 0 level 4 prefix "input eth1: "
24       0     0 ACCEPT     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0            /* 20 allow all docker0 traffic */
25    158M   59G ACCEPT     all  --  corp0 *       0.0.0.0/0            0.0.0.0/0            /* 20 allow all corp0 traffic */
26       0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            multiport dports 22 /* 500 allow ssh on service net */ ctstate NEW recent: SET name: DEFAULT side: source mask: 255.255.255.255
27       0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            multiport dports 22 /* 501 limit ssh on service net */ ctstate NEW recent: UPDATE seconds: 180 hit_count: 20 name: DEFAULT side: source mask: 255.255.255.255
28       0     0 ACCEPT     all  --  br-8ce7e363e4f9 *       0.0.0.0/0            0.0.0.0/0

I retried querying for accounts.google.com, and I got a DNS reply!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
dxia@my-host:~$ sudo nsenter -n -t $(docker inspect --format  bar) dig @127.0.0.11 accounts.google.com

; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> @127.0.0.11 accounts.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 9592
;; flags: qr rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Tue Apr 27 00:07:13 UTC 2021
;; MSG SIZE  rcvd: 12

But… there’s no A records? Docker daemon logs stated that the upstream local unbound DNS server did not return any A records.

Error #3: unbound refused to reply to DNS queries from a private IP range used by the Docker network we’re using

1
2
3
Apr 27 00:07:13 my-host.corp.net dockerd[32659]: time="2021-04-27T00:07:13.864160829Z" level=debug msg="Name To resolve: accounts.google.com."
Apr 27 00:07:13 my-host.corp.net dockerd[32659]: time="2021-04-27T00:07:13.864325564Z" level=debug msg="[resolver] query accounts.google.com. (A) from 172.20.0.127:57576, forwarding to udp:10.99.0.1"
Apr 27 00:07:13 my-host.corp.net dockerd[32659]: time="2021-04-27T00:07:13.864537556Z" level=debug msg="[resolver] external DNS udp:10.99.0.1 did not return any A records for \"accounts.google.com.\""

Hm, I noticed the status in the empty DNS reply is REFUSED. I recalled that unbound supports configuring which DNS queries it will reply to based on originating interface and IP.

1
2
3
4
dxia@my-host:~$ grep access-control /etc/unbound/unbound.conf
    access-control: 127.0.0.1 allow
    access-control: 10.99.0.0/24 allow
    access-control: 10.174.18.90 allow

Bingo! There’s no access-control entry that allowed DNS queries from 172.20.0.127. I added access-control: 172.16.0.0/12 allow (since all of 172.16.0.0/12 is private IPv4 address space according to RFC-1918) and reloaded unbound. Now it worked!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
dxia@my-host:~$ sudo systemctl reload unbound
dxia@my-host:~$ sudo nsenter -n -t $(docker inspect --format  bar) dig @127.0.0.11 accounts.google.com

; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> @127.0.0.11 accounts.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36432
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;accounts.google.com.     IN  A

;; ANSWER SECTION:
accounts.google.com.  143 IN  A   74.125.202.84

;; Query time: 2 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Tue Apr 27 00:11:41 UTC 2021
;; MSG SIZE  rcvd: 64

Docker daemon logs showed the following.

1
2
3
Apr 27 00:11:41 my-host.corp.net dockerd[32659]: time="2021-04-27T00:11:41.957934764Z" level=debug msg="Name To resolve: accounts.google.com."
Apr 27 00:11:41 my-host.corp.net dockerd[32659]: time="2021-04-27T00:11:41.958087666Z" level=debug msg="[resolver] query accounts.google.com. (A) from 172.20.0.127:33346, forwarding to udp:10.99.0.1"
Apr 27 00:11:41 my-host.corp.net dockerd[32659]: time="2021-04-27T00:11:41.960007990Z" level=debug msg="[resolver] received A record \"74.125.202.84\" for \"accounts.google.com.\" from udp:10.99.0.1"

General Debugging Strategies and Techniques I Used

Here are the general debugging strategies I used and reinforced for myself.

  • When network requests fail, go down one layer on the stack to identify on exactly which layer it fails. I.e. find out which protocol is responsible for the failure: HTTP, TCP, IP?
  • Try to reproduce the error by running the most direct and minimal command that simulates my actual failure. In this case, an HTTP request made by gcloud was failing. I translated that into an nc command that simulated the establishment of the TCP connection between containers. Or a DNS query from bar was failing. I translated that into a dig command. And in all these cases, the origin of these IP packets mattered. So knowing how to use nsenter to enter a network namespace and create IP packets that originate from the same container was useful. nsenter is essential when debugging containers that don’t have any tools installed in them. The bar image only contains one go-compiled executable. There’s no other tools I can use in there.
  • I encountered three errors in this case. Tackle one error at a time, and don’t give up.
  • Be scientific. Have a working hypothesis at each step for which you collect evidence that either supports or refutes it.
  • If you get stuck, go back and question your previous assumptions or conclusions. Are there other tests you can run that can confirm or disprove what you thought was true?

Patches

Error #1: I created a patch that makes our Puppet installation ignore rules created by Docker networks in the filter table’s FORWARD chain.

Error #2: Unfortunately, I don’t think there’s a good solution to this other than disabling our GCE VM’s periodic Puppet runs and manually adding a rule to allow packets from the new interface. The chain’s default policy is DROP, and interface names are dynamic.

Error #3: I made a patch that makes unbound reply to DNS queries with source IPs of in the range 172.16.0.0/12.

Comments