Department of Engineering

IT Services

Cache

To get statistics on your program's cache usage in linux, try this (for a program called a.out in the current directory)

 
   valgrind --tool=cachegrind ./a.out

You'll get output something like the following -

==10427== I   refs:      16,236,605
==10427== I1  misses:         2,182
==10427== LLi misses:         1,767
==10427== I1  miss rate:       0.01%
==10427== LLi miss rate:       0.01%
==10427== 
==10427== D   refs:       6,817,725  (4,443,209 rd   + 2,374,516 wr)
==10427== D1  misses:        13,180  (   10,651 rd   +     2,529 wr)
==10427== LLd misses:         7,527  (    5,427 rd   +     2,100 wr)
==10427== D1  miss rate:        0.1% (      0.2%     +       0.1%  )
==10427== LLd miss rate:        0.1% (      0.1%     +       0.0%  )
==10427== 
==10427== LL refs:           15,362  (   12,833 rd   +     2,529 wr)
==10427== LL misses:          9,294  (    7,194 rd   +     2,100 wr)
==10427== LL miss rate:         0.0% (      0.0%     +       0.0%  )

where

  • I1 refers to the first-level instruction cache
  • D1 refers to the first-level data cache
  • LL refers to the last-level common cache
  • rd refers to reads
  • wr refers to writes

The initial number on each line is the suffix of a data file that valgrind creates. You can run cg_annotate on this file - in this case by doing

  cg_annotate cachegrind.out.10427

to get further information. The first thing to be printed out will be a guess about the machine's cache configuration. On a standard CUED DPO machine I get

I1 cache:         32768 B, 64 B, 8-way associative
D1 cache:         32768 B, 64 B, 8-way associative
LL cache:         3145728 B, 64 B, 12-way associative

whereas on a CUED ts-access machine I get

I1 cache:         32768 B, 64 B, 4-way associative
D1 cache:         32768 B, 64 B, 8-way associative
LL cache:         12582912 B, 64 B, 24-way associative

The cachegrind manual offers tips in how to take advantage of these stats.