Search Contact information
University of Cambridge Home Department of Engineering
University of Cambridge >  Engineering Department >  Computing Help >  Unix

Grid Engine

Contents icon

If you want to run programs while you aren't logged in, you can use our Grid Engine facility. This lets you use the CPU-power of CUED Teaching System Linux-based machines when they're not used for other tasks. Outside term, several machines are available all day. During term, machines are likely to run your programs only at weekends and on weekdays between 6pm and 9am, but you can submit your programs at any time. At its simplest you

You'll be mailed when your program starts and finishes. Your program will produce the files it usually produces. Text output will go into a file called forthegrid.o*. Most of the rest of this page tells you what to do if things go wrong.

Preparing your code

Some programs will work under Grid Engine with no extra work. Others may require recompiling or rewriting. Many will run much faster if a little thought is given to optimising the code. Once programs run for days, even an improvement of a few percent becomes significant. See

for ways of speeding your programs up.

Under Grid Engine your program will be run by default in your home directory, and your PATH (the list of places where programs are looked for) will be /usr/local/bin:/bin:/usr/bin. To start your main program in a different directory, use cd in the script file - see below - or use qsub -cwd ... instead of qsub .... The "-cwd" option makes your program run in the same directory that qsub was run in. You can add directories to the PATH too.

If your program requires interaction you'll have to rewrite it so that interaction isn't required. See the Command line options section for help.

Note that your program won't run much faster than it would on your own machine, but Grid Engine lets you run many programs at once, so if you structure your work appropriately you can increase your work-rate by an order of magnitude or so.

Setting up

You can use Grid Engine only from the Teaching System's Linux Servers, so you'll need to log into one of those before doing anything else. You'll then need to set your environment up by running

        source /usr/local/apps/gridengine/blades/common/settings.sh
if your login shell is bash (the default), or by running
        source /usr/local/apps/gridengine/blades/common/settings.csh

if your login shell is csh or tcsh.

You can cause this initialisation to always happen when you log in by editing the appropriate shell initialisation file (as documented in various places including CUED's shell script page).

Running jobs

In order to run jobs, you will need to write a shell script (even if the program you actually want to run is a compiled program, Grid Engine insists on a script being submitted; it need do no more than call your program directly).

A typical minimal script (with comments) would be

#!/bin/sh # # the next line is a "magic" comment that tells codine to use bash #$ -S /bin/bash # # if you want to start in a directory other than your home # directory, uncomment and change the next line # cd ~/folder/to/start/in # if you want to add directories to the PATH, uncomment and # change the next line # export PATH=$PATH:/directory/to/add:/other/directory/to/add # Now for some real work myprogramname argument1 argument2

You need to make this script executable - see the Unix groups and file permissions page, or just type

        chmod u+x scriptname

Now you can submit a job on the queue:

        qsub ./scriptname

You should get a response like

        Your job 24 ("scriptname") has been submitted.

The number (in this case 24) is the "job ID", which you'll need if you're going to cancel the job or report a bug. If there's a machine free to run your job, execution will begin straight away. Otherwise your job joins the queue of all the other jobs that are pending.

Resource limits

By default, several resources that your program can use are limited. Currently only one queue exists, which is configured to kill jobs which run for more than 168 hours.

Among the resources that a job uses that can be measured and restricted, are the "real time" it runs for, and the cpu time it uses. The current restriction on the (default) queue is that a job will be killed if it runs for more than 168 hours of real time, or more than 168 hours of cpu time. This is an attempt to correctly balance allocation so as not to allow

Jobs will received a SIGXCPU signal when they hit 168 hours real or 168 hours cpu time, and a SIGKILL signal when they hit 168hrs10min real or 168hrs10min cpu time.

Matlab

If you want to run a matlab command my_routine, create a script like the following, and run it as before

#!/bin/sh # # the next line is a "magic" comment that tells codine to use bash #$ -S /bin/bash # export DISPLAY="" matlab -nojvm -r my_routine

The DISPLAY="" line stops matlab trying (and failing) to display graphics (though you can still use graphics commands and print the results into a file). The -nojvm option turns off unneeded facilities. Note that after -r you don't put a filename - for example, something like matlab -nojvm -r project/test1.m won't work, though cd project; matlab -nojvm -r test1 should. Note also that by default matlab will be run from your home directory, so any files produced will be there too, by default.

Make sure you understand matlab's notion of "path" otherwise the routines you call might not be found. If you type "path" inside matlab you'll get a list of directories where matlab will look for routines. It will also look in the current directory. The output that normally goes to the command window can be saved in a file. See the Grid Engine from the command line page for details.

ABAQUS

See submitting ABAQUS jobs to gridengine

Graphical interface: qmon

controls Grid Engine provides a graphical interface (see right) to monitoring queues, submitting jobs, and so on. This can be started by running:

        qmon

The iconic representations of the different functions aren't easily interpretable, but hovering the mouse over them gives useful pop-up descriptions. You'll find options to list available machines, submit jobs in other ways, etc. For example, the "Submit Jobs" panel offers these advanced options so that you can choose when to be mailed the job's progress (and control many other things too)

submission

Command line interface

For some purposes it is far easier to the use command line to control and monitor Grid Engine tasks. For example

See the Grid Engine from the command line page for more extensive information.

Troubleshooting

Several things can go wrong when using Grid Engine. It's worth checking a simple case to confirm that the mechanism is working at all. Note that except for having more time, your jobs have no extra priviledges (no extra RAM or disc space, for example) so if the script you're going to run with GridEngine doesn't start when you run it normally from the command line, it's not going to work with Grid Engine.

The output and error message that you'd normally see in the terminal window are by default put into files with the same name as the program you're running, but with added suffices - .e for errors and .o for output - followed by a number. Look in these files for clues.

Documentation on other sites

The main web site about Grid Engine is the one provided by Sun.

© Cambridge University Engineering Dept
Tim Love (tpl)
(from information provided by js138, pjb1008, and jpmg)
Last updated: February 2012

Valid XHTML 1.0 Transitional