Department of Engineering

IT Services

Basic optimising and profiling

Optimising

nocyclesBefore executing a long job, be sure you've optimised it. There are trade-offs between speed, program size, code readability and portability when you optimise the code, but you can sometimes obtain large benefits without unpleasant side-effects by

  • reducing the size of inmost nested loop bodies
  • reviewing the choice of number-crunching algorithms (for sorting, searching, matrix multiplication, etc)
  • reviewing how values are passed to function - by value or by reference? If you're passing very large items by value, memory requirements (and hence time requirements) increase

Note that in recent years CPU speeds have risen faster than memory speeds, so your bottlenecks may well be memory-related.

If you're using fortran or C/C++ look through the optimisation options of the compiler carefully! Use "man g++" to read about the C++ compiler, and "man g77" to read about the fortran compiler

  • Fortran - the gfortran compiler has many optimisation-related options. Use gfortran -O to enable most of the optimisation features.
  • C++ - Use the Standard Library routines in preference to your own - they're likely to be safer and faster. Especially if you're using templates, compiler optimising will make a lot of difference.

    The g++ compiler has dozens of optimisation options that can be selected individually. Using the -O3 option switches on the most aggresive methods. Note however that optimisation can make a program fail that was (fortuitously) working. For example, using the +Onoinitcheck will mean that variables won't be initialised to 0 automatically unless the C++ specification says so.

  • matlab - See the Using Matlab handout and the Matlab help page for some efficiency tips. Vectorisation and the use of sparse arrays can make a huge difference (orders of magnitude) to the size and speed of the resulting application. Check the Matlab Vectorisation Tricks page for details.

Profiling

If the program still runs slower than you expect (or rather, hope!) then it may be useful to profile it to give you a better idea of where you should concentrate your energy.

  • - If you have an f90 program called crunch.f90 you can find out how many times each routine is called and how long the routine takes to run by using gprof. You need to do something like
    gfortran -pg -ocrunch crunch.f90
    ./crunch
    gprof crunch
    
    This produces a lot of output. The most important information is at the start. Typing "gprof | more" instead of just "gprof" will ease readability.
  • C++ - If you have a C++ program called crunch.cc you can find out how many times each routine is called and how long the routine takes to run by using gprof. You need to do something like
    g++ -pg -ocrunch crunch.cc
    crunch
    gprof
    
    This produces a lot of output. The most important information is at the start.
  • matlab - Before you start spending a lot of time on optimising it's useful to find out where the main bottlenecks are. The profile command can do this for you, providing text or graphical output. For example, this is how you could profile a matlab routine called spdemo.m
      profile on 
      spdemo
      profile plot
      profile off
    
    Within the department one particular diagnostic session led to a speedup of 3000 times (to several hours to several seconds) when it turned out that the same huge .mat file was being unintentionally loaded on every iteration of a loop.

Code coverage

It can be useful to know how many times each line of source code was run. If you put the following C++ code into foo.cc

  int main() {
    for (int i=0;i<5;i++)
      if (i>3)
        return 0;
    int j=7;
  }

and run

  g++ -fprofile-arcs -ftest-coverage  foo.cc
  ./a.out
  gcov foo.cc

the output will be

File 'foo.cc'
Lines executed:80.00% of 5
foo.cc:creating 'foo.cc.gcov'

and foo.cc.gcov will contain

        -:    0:Source:foo.cc
        -:    0:Graph:foo.gcno
        -:    0:Data:foo.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        1:    1:int main() {
        5:    2:  for (int i=0;i<5;i++)
        5:    3:    if (i>3)
        1:    4:      return 0;
    #####:    5:  int j=7;
        -:    6:}

showing that line 4 was reached once and line 5 never reached. Such information can be useful when deciding where to concentrate on optimising.