Basic optimising and profiling
Optimising
Before executing a long job, be sure you've optimised it.
There are trade-offs between speed, program size, code readability and portability
when you optimise the code, but you can sometimes obtain large benefits without
unpleasant side-effects by
- reducing the size of inmost nested loop bodies
- reviewing the choice of number-crunching algorithms (for sorting, searching, matrix multiplication, etc)
- reviewing how values are passed to function - by value or by reference? If you're passing very large items by value, memory requirements (and hence time requirements) increase
Note that in recent years CPU speeds have risen faster than memory speeds, so your bottlenecks may well be memory-related.
If you're using fortran or C/C++ look through the optimisation options of the compiler carefully! Use "man g++" to read about the C++ compiler, and "man g77" to read about the fortran compiler
- Fortran - the gfortran compiler has many optimisation-related options. Use gfortran -O to enable most of the optimisation features.
- C++ - Use the Standard Library routines in preference to your own - they're likely to be safer and faster.
Especially if you're using templates, compiler optimising will
make a lot of difference.
The g++ compiler has dozens of optimisation options that can be selected individually. Using the -O3 option switches on the most aggresive methods. Note however that optimisation can make a program fail that was (fortuitously) working. For example, using the +Onoinitcheck will mean that variables won't be initialised to 0 automatically unless the C++ specification says so.
- matlab - See the Using Matlab handout and the Matlab help page for some efficiency tips. Vectorisation and the use of sparse arrays can make a huge difference (orders of magnitude) to the size and speed of the resulting application. Check the Matlab Vectorisation Tricks page for details.
Profiling
If the program still runs slower than you expect (or rather, hope!) then it may be useful to profile it to give you a better idea of where you should concentrate your energy.
- - If you have an f90 program called crunch.f90 you can
find out how many times each routine is called and how long the
routine takes to run by using gprof.
You need to do something like
gfortran -pg -ocrunch crunch.f90 ./crunch gprof crunch
This produces a lot of output. The most important information is at the start. Typing "gprof | more" instead of just "gprof" will ease readability. - C++ - If you have a C++ program called crunch.cc you can
find out how many times each routine is called and how long the
routine takes to run by using gprof.
You need to do something like
g++ -pg -ocrunch crunch.cc crunch gprof
This produces a lot of output. The most important information is at the start. - matlab -
Before you start spending a lot of time on optimising it's useful to
find out where the main bottlenecks are. The profile command
can do this for you, providing text or graphical output. For example,
this is how you could profile a matlab routine called spdemo.m
profile on spdemo profile plot profile off
Within the department one particular diagnostic session led to a speedup of 3000 times (to several hours to several seconds) when it turned out that the same huge .mat file was being unintentionally loaded on every iteration of a loop.
Code coverage
It can be useful to know how many times each line of source code was run. If you put the following C++ code into foo.cc
int main() { for (int i=0;i<5;i++) if (i>3) return 0; int j=7; }
and run
g++ -fprofile-arcs -ftest-coverage foo.cc ./a.out gcov foo.cc
the output will be
File 'foo.cc' Lines executed:80.00% of 5 foo.cc:creating 'foo.cc.gcov'
and foo.cc.gcov will contain
-: 0:Source:foo.cc -: 0:Graph:foo.gcno -: 0:Data:foo.gcda -: 0:Runs:1 -: 0:Programs:1 1: 1:int main() { 5: 2: for (int i=0;i<5;i++) 5: 3: if (i>3) 1: 4: return 0; #####: 5: int j=7; -: 6:}
showing that line 4 was reached once and line 5 never reached. Such information can be useful when deciding where to concentrate on optimising.