← Posts

hwbench — Hardware Benchmark in C

You want to know what your machine actually does — not what the spec sheet says. hwbench runs three tests directly on the hardware and prints real numbers: CPU throughput, memory bandwidth, and cache latency. No installation, no dependencies, no root required.

Download

Source: hwbench-source.tar.gz | hwbench-source.zip

Build: gcc -O2 -msse2 -o hwbench hwbench.c

Requires gcc and the standard C library. Nothing else.

What Does hwbench Measure?

CPU Throughput

How fast the processor executes a serial chain of arithmetic — integer and floating point. Each operation depends on the result of the previous one, so the CPU cannot run multiple operations at once. The number reflects real throughput per core, not a theoretical peak.

Memory Bandwidth

How fast data moves between the processor and RAM. Three paths are measured:

Cache Latency

How long it takes to fetch a single piece of data from each level of the memory hierarchy. Modern CPUs have small, fast caches close to the processor (L1, L2, L3) and slower main RAM further away. The latency jumps between levels show you exactly where the speed cliff is.

The latency test uses a random access pattern designed to defeat the CPU’s hardware prefetcher — the mechanism that tries to predict what data you’ll need next. Without that, you’d be measuring the prefetcher, not the cache.

Sample Output

CPU
  Integer throughput    :   1132 MOPS
  Float throughput      :    564 MOPS

Memory Bandwidth
  Write (cached/RFO)    :   20.1 GB/s
  Write (non-temporal)  :   23.2 GB/s
  Sequential read       :   16.2 GB/s

Cache Latency
  16 KB   (L1)          :    1.2 ns
  256 KB  (L2)          :    2.8 ns
  4 MB    (L3)          :   10.7 ns
  256 MB  (RAM)         :   98.6 ns

MOPS = millions of operations per second. GB/s = gigabytes per second. ns = nanoseconds per memory access.

What Normal Looks Like

On a modern desktop or laptop CPU:

Measurement Typical range
Integer throughput 500 – 4000 MOPS
Float throughput 300 – 3000 MOPS
Write (non-temporal) 15 – 50 GB/s
Sequential read 15 – 50 GB/s
L1 latency 1 – 2 ns
L2 latency 3 – 6 ns
L3 latency 8 – 20 ns
RAM latency 60 – 120 ns

Numbers outside these ranges aren’t wrong — older hardware, single-channel memory, or a thermally throttled laptop will land lower. The value is in running it on multiple machines and comparing.

Why C

The write bandwidth test requires non-temporal store instructions (MOVNTDQ) to bypass the cache and measure true memory throughput. The cache latency test requires a serial dependent-load chain that the CPU cannot speculatively execute around. Both techniques demand direct control over what the compiler and hardware actually do. C provides that control. A higher-level language cannot make the same guarantees.

FAQ

Q: What is hwbench? A single C file that measures CPU throughput, memory bandwidth, and cache latency. It runs directly on the hardware with no external dependencies.

Q: Does hwbench require root or installation? No. Drop the source, compile it, run it.

Q: How long does it take? About 30 seconds. Results print as they come in.

Q: What do the bandwidth numbers mean? The gap between cached write and non-temporal write is the cost of the hidden Read-For-Ownership on every cache-missing store. Non-temporal write reflects the true bandwidth ceiling of your RAM.

Q: What does the cache latency test actually measure? Access time per level — L1, L2, L3, and RAM. The random access pattern defeats the prefetcher so the numbers reflect actual latency, not speculative fetch performance.

Tested on Debian/CrunchBang++. Requires gcc. Linux only.