Cache
To get statistics on your program's cache usage in linux, try this (for a program called a.out
in the current directory)
valgrind --tool=cachegrind ./a.out
You'll get output something like the following -
==10427== I refs: 16,236,605 ==10427== I1 misses: 2,182 ==10427== LLi misses: 1,767 ==10427== I1 miss rate: 0.01% ==10427== LLi miss rate: 0.01% ==10427== ==10427== D refs: 6,817,725 (4,443,209 rd + 2,374,516 wr) ==10427== D1 misses: 13,180 ( 10,651 rd + 2,529 wr) ==10427== LLd misses: 7,527 ( 5,427 rd + 2,100 wr) ==10427== D1 miss rate: 0.1% ( 0.2% + 0.1% ) ==10427== LLd miss rate: 0.1% ( 0.1% + 0.0% ) ==10427== ==10427== LL refs: 15,362 ( 12,833 rd + 2,529 wr) ==10427== LL misses: 9,294 ( 7,194 rd + 2,100 wr) ==10427== LL miss rate: 0.0% ( 0.0% + 0.0% )
where
- I1 refers to the first-level instruction cache
- D1 refers to the first-level data cache
- LL refers to the last-level common cache
- rd refers to reads
- wr refers to writes
The initial number on each line is the suffix of a data file that valgrind creates. You can run cg_annotate
on this file - in this case by doing
cg_annotate cachegrind.out.10427
to get further information. The first thing to be printed out will be a guess about the machine's cache configuration. On a standard CUED DPO machine I get
I1 cache: 32768 B, 64 B, 8-way associative D1 cache: 32768 B, 64 B, 8-way associative LL cache: 3145728 B, 64 B, 12-way associative
whereas on a CUED ts-access
machine I get
I1 cache: 32768 B, 64 B, 4-way associative D1 cache: 32768 B, 64 B, 8-way associative LL cache: 12582912 B, 64 B, 24-way associative
The cachegrind manual offers tips in how to take advantage of these stats.