Benchmarking Kafka and Google Cloud Pub/Sub Latencies

I’m helping a recently acquired team at work figure out if they can migrate from Kafka to Google Cloud Pub/Sub. Part of the exploration was figuring out the change in latencies, if any, from switching.

The team’s production setup is like this.

They paid an external company called Confluent to run a managed Kafka cluster in AWS Oregon.
This is the same region where this team ran all their backend services. Part of their migration also involves switching their workloads from AWS Oregon to GCP us-central1. If they choose to migrate to Pub/Sub, their services will be publishing and subscribing to messages across cloud providers and regions. So my latency benchmarks took that into account.
All their services are written in Golang.
Services run as containers in AWS Elastic Container Service.

I defined latency as the time elapsed from when a message is published and when it’s received by a subscriber. I didn’t count the extra time it takes for a subscriber to acknowledge the message. I used Golang and the same upstream libraries for Kafka and Pub/Sub that they used or would use, respectively, in production. I published messages of various sizes at various rates from AWS EC2 instances in Oregon for five minutes. At the same time, five Google Compute Engine instances in us-central1 subscribed to these messages (pull-based) as fast as possible with an initial burn-in period of one minute. I didn’t measure the latency until the burn-in period elapsed to avoid any effects on latency that may arise from using a new topic or subscription or not enough messages flowing through the messaging service. This ensured I more closely mimicked message latency in production. I always took the percentile summary of the subscriber with the second highest p99 latency. I created new Pub/Sub or Kafka topics for each series in the graphs below. Kafka topics always had eight partitions.

I took some inspiration from a blog post titled “Benchmarking Message Queue Latency” and also found the following GCP post “Testing Cloud Pub/Sub clients to maximize streaming performance.” The latter linked to the code used to benchmark Pub/Sub. Unfortunately, after trying that tool many times and finding it wasn’t documented well or had various issues like this, I gave up and wrote my own simple latency benchmarker in Golang. This was probably better anyways to ensure I was using the same language and client libraries as the team I was helping.

My full results are in this Google sheet. The benchmarking code is at github.com/davidxia/cloud-message-latency.

Pub/Sub Latencies

Kafka Latencies

Summary

With my specific test parameters, Kafka p99 latencies are 100-200ms and much lower than Pub/Sub latencies. In the worst case scenarios, Pub/Sub latencies were almost an order of magnitude higher. Pub/Sub p99 latencies were approximately 0.5-1 seconds at the team’s current publisher throughput which is relatively low at about 1KB/s. At higher throughputs the latencies dropped to 300-400ms. This conforms to Google’s documentation and generally accepted knowledge that Pub/Sub performs faster at higher message volumes. According to one of that team’s engineers, this latency is acceptable for all messages except for one which can be changed to a direct service-to-service request.

It was also interesting to see that message delivery was pretty evenly spread out over five subscribers with Pub/Sub. Kafka often had a few consumers that received twice as many messages as their peers.

After I finished benchmarking, I found PerfKitBenchmarker, an open source benchmarking tool used to measure and compare cloud offerings. It looks promising, but I haven’t tried it out yet.

David Xia