hwbench — Hardware Benchmark in C
You want to know what your machine actually does — not what the spec sheet says. hwbench runs three tests directly on the hardware and prints real numbers: CPU throughput, memory bandwidth, and cache latency. No installation, no dependencies, no root required.
Download
Source: hwbench-source.tar.gz | hwbench-source.zip
Build: gcc -O2 -msse2 -o hwbench hwbench.c
Requires gcc and the standard C library. Nothing else.
What Does hwbench Measure?
CPU Throughput
How fast the processor executes a serial chain of arithmetic — integer and floating point. Each operation depends on the result of the previous one, so the CPU cannot run multiple operations at once. The number reflects real throughput per core, not a theoretical peak.
Memory Bandwidth
How fast data moves between the processor and RAM. Three paths are measured:
- Write (cached/RFO) — the normal write path. Before writing a memory location, the CPU fetches the existing data first — a Read-For-Ownership. This hidden read costs bandwidth.
- Write (non-temporal) — the fast path. Bypasses the cache entirely and streams directly to RAM. No hidden read. This is the true write bandwidth of the hardware.
- Sequential read — how fast the CPU can read a large block of data from RAM.
Cache Latency
How long it takes to fetch a single piece of data from each level of the memory hierarchy. Modern CPUs have small, fast caches close to the processor (L1, L2, L3) and slower main RAM further away. The latency jumps between levels show you exactly where the speed cliff is.
The latency test uses a random access pattern designed to defeat the CPU’s hardware prefetcher — the mechanism that tries to predict what data you’ll need next. Without that, you’d be measuring the prefetcher, not the cache.
Sample Output
CPU
Integer throughput : 1132 MOPS
Float throughput : 564 MOPS
Memory Bandwidth
Write (cached/RFO) : 20.1 GB/s
Write (non-temporal) : 23.2 GB/s
Sequential read : 16.2 GB/s
Cache Latency
16 KB (L1) : 1.2 ns
256 KB (L2) : 2.8 ns
4 MB (L3) : 10.7 ns
256 MB (RAM) : 98.6 ns
MOPS = millions of operations per second. GB/s = gigabytes per second. ns = nanoseconds per memory access.
What Normal Looks Like
On a modern desktop or laptop CPU:
| Measurement | Typical range |
|---|---|
| Integer throughput | 500 – 4000 MOPS |
| Float throughput | 300 – 3000 MOPS |
| Write (non-temporal) | 15 – 50 GB/s |
| Sequential read | 15 – 50 GB/s |
| L1 latency | 1 – 2 ns |
| L2 latency | 3 – 6 ns |
| L3 latency | 8 – 20 ns |
| RAM latency | 60 – 120 ns |
Numbers outside these ranges aren’t wrong — older hardware, single-channel memory, or a thermally throttled laptop will land lower. The value is in running it on multiple machines and comparing.
Why C
The write bandwidth test requires non-temporal store instructions (MOVNTDQ) to bypass the cache and measure true memory throughput. The cache latency test requires a serial dependent-load chain that the CPU cannot speculatively execute around. Both techniques demand direct control over what the compiler and hardware actually do. C provides that control. A higher-level language cannot make the same guarantees.
FAQ
Q: What is hwbench? A single C file that measures CPU throughput, memory bandwidth, and cache latency. It runs directly on the hardware with no external dependencies.
Q: Does hwbench require root or installation? No. Drop the source, compile it, run it.
Q: How long does it take? About 30 seconds. Results print as they come in.
Q: What do the bandwidth numbers mean? The gap between cached write and non-temporal write is the cost of the hidden Read-For-Ownership on every cache-missing store. Non-temporal write reflects the true bandwidth ceiling of your RAM.
Q: What does the cache latency test actually measure? Access time per level — L1, L2, L3, and RAM. The random access pattern defeats the prefetcher so the numbers reflect actual latency, not speculative fetch performance.
Tested on Debian/CrunchBang++. Requires gcc. Linux only.