Benchmarking low-level I/O: C, C++, Rust, Golang, Java, Python

Benchmarking TCP Proxies is probably the simplest case. There is no data processing, only handling incoming/outgoing connections and relaying raw byte data. It’s nearly impossible for a micro-service to be faster than a TCP proxy because it cannot do less than that. It can only do more. Any other functionality is built on top of that — parsing, validating, traversing, packing, computing, etc.

The following solutions are being compared:

  • HAProxy— in TCP-proxy mode. To compare to a mature solution written in C:

  • draft-http-tunnel — a simple C++ solution with very basic functionality (trantor) (running in TCP mode): (thanks to Cesar Mello, who coded it to make this benchmark possible).

  • http-tunnel — a simple HTTP-tunnel/TCP-proxy written in Rust (tokio) (running in TCP mode): (you can read more about it here).

  • tcp-proxy — a Golang solution:

  • NetCrusher — a Java solution (Java NIO). Benchmarked on JDK 11, with G1:

  • pproxy — a Python solution based on asyncio (running in TCP Proxy mode):

All of the solutions above use Non-blocking I/O. In my previous post, I tried to convince the reader that it’s the best way to handle network communication, if you need highly available services with low-latency and large throughput.

A quick note — I tried to pick the best solutions in Golang, Java, and Python, but if you know of better alternatives, feel free to reach out to me.

The actual backend is Nginx, which is configured to serve 10kb of data in HTTP mode.

Benchmark results are split into two groups:

  • Baseline, C, C++, Rust —high-performance languages.

  • Rust, Golang, Java, Python —memory-safe languages.

Yep, Rust belongs to both worlds.

Brief description of the methodology

  • Two cores allocated for TCP Proxies (using cpuset).

  • Two cores allocated for the backend (Nginx).

  • Request rate starts at 10k, ramping up to 25k requests per second (rps).

  • Connections being reused for 50 requests (10kb each request).

  • Benchmarks ran on the same VM to avoid any network noise.

  • The VM instance type is compute optimized (exclusively owns all allocated CPUs) to avoid “noisy neighbors” issues.