Department of Engineering

IT Services


If you're using any of the maths routines remember that you'll need to mention the maths library on the compile line (otherwise the maths code won't be linked in) and that you'll need to include math.h (otherwise the values returned by the routines could be misinterpreted).

hp-ux has an alternative ANSI C conformant math library that can be accessed using `-lM' rather than `-lm' on the command line. Use it wherever possible.

Before you start writing much maths-related code, check to see that it hasn't all been done before. Many maths routines, including routines that offer arbitrary precision are available by ftp from Also see the CUED help/languages/C/math_routines file in the gopher for a long list of available resources.

One problem when writing numerical algorithms is obtaining machine constants. On Sun's they can be obtained in <values.h>. The ANSI C standard recommends that such constants be defined in the header file <float.h>. Sun's and standards apart, these values are not always readily available.

The NCEG (Numerical C Extensions Group) is working on proposing standard extensions to C for numerical work, but nothing's ready yet, so before you do any heavy computation, especially with real numbers, I suggest that you browse through a Numerical Analysis book. Things to avoid are

  • Finding the difference between very similar numbers (if you're summating an alternate sign series, add all the positive terms together and all the negative terms together, then combine the two).
  • Dividing by a very small number (change the order of operations so that this doesn't happen).
  • Multiplying by a very big number.

Common problems that you might face are :-

Testing for equality :-
Real numbers are handled in ways that don't guarantee expressions to yield exact results. It's risky to test for exact equality. Better is to use something like
  d = max(1.0, fabs(a), fabs(b))
and then test fabs(a - b) / d against a relative error margin. Useful constants in float.h are FLT_EPSILON, DBL_EPSILON, and LDBL_EPSILON, defined to be the smallest numbers such that
 1.0f + FLT_EPSILON  != 1.0f
 1.0  + DBL_EPSILON  != 1.0 
 1.0L + LDBL_EPSILON != 1.0L

Avoiding over- and underflow :-
You can test the operands before performing an operation in order to check whether the operation would work. You should always avoid dividing by zero. For other checks, split up the numbers into fractional and exponent part using the frexp() and ldexp() library functions and compare the resulting values against HUGE (all in <math.h>).

Floats and Doubles :-
K&R C encouraged the interchangeable use of float and double since all expressions with such data types where always evaluated using the double representation - a real nightmare for those implementing efficient numerical algorithms in C. This rule applied, in particular, to floating-point arguments and for most compilers around it does not matter whether one defines the argument as float or double.
According to the ANSI C standard such programs will continue to exhibit the same behavior as long as one does not prototype the function. Therefore, when prototyping functions make sure the prototype is included when the function definition is compiled so the compiler can check if the arguments match.
  • Keep in mind that the double representation does not necessarily increase the precision. Actually, in most implementations the worst-case precision decreases but the range increases.

  • Do not use double or long double unnecessarily since there may a large performance penalty. Furthermore, there is no point in using higher precision if the additional bits which will be computed are garbage anyway. The precision one needs depends mostly on the precision of the input data and the numerical method used.

Infinity :-
The IEEE standard for floating-point recommends a set of functions to be made available. Among these are functions to classify a value as NaN, Infinity, Zero, Denormalized, Normalized, and so on. Most implementations provide this functionality, although there are no standard names for the functions. Such implementations often provide predefined identifiers (such as _NaN, _Infinity, etc) to allow you to generate these values.
If x is a floating point variable, then (x != x) will be TRUE if and only if x has the value NaN. Many C implementations claim to be IEEE 748 conformant, but if you try the (x!=x) test above with x being a NaN, you'll find that they aren't.
In the mean time, you can write your own `standard' functions and macros, and provide versions of them for each system you use. If the system provides the functions you need, you #define your `standard' functions to be the system functions. Otherwise, you write your function as an interface to what the system provides, or write your own from scratch.

On HPs, type man ieee for a summary of functions which are required for, or recommended by, the IEEE-754 standard for floating-point arithmetic, and type man fpgetround to see how to control rounding.

See matherr(3) for details on how to cope with errors once they've happened.

If you use an HP, the following information might be of use if you want to trap exceptions. It's from Cary Coutant (Hewlett-Packard, California Language Lab). Because of the pipelined nature of the PA-RISC FPU, most exceptions are delayed. The instruction that causes the exception (the divide) does not actually trap. Instead, the next floating-point instruction causes the trap, so you usually don't want to bypass that instruction - what you need to do is fix the result of the earlier instruction. Even if you did bypass the current instruction, you'd hit another trap when the next floating-point instruction is executed.

Your signal handler will need to (1) examine the floating-point exception registers to determine what the offending instruction was, (2) supply a suitable replacement value for its result, (3) save the replacement value in the appropriate part of the signal context, (4) clear the exception register, and (5) clear the T (trap) bit in the floating-point status register.

Alternatively, you could turn off the exception enable bits so that the signal isn't generated in the first place, or you could use HP library routines to handle the exceptions for you.

Here's a skeleton floating-point exception handler for PA. For demonstration purposes, it assumes that exceptions are caused by double-precision arithmetic instructions, whose target registers are specified in the bottom 5 bits of the instruction. It also does not handle PA-RISC 1.1 floating-point instructions. Modification of the exception handler for anything useful is left as an exercise for the reader. You'll definitely need the PA-RISC Architecture and Instruction Set Reference Manual.

fpe_handler(sig, code, scp)
int sig, code;
struct sigcontext *scp;
    int i;

    printf("Trap (signal %d, code %d)\n", sig, code);
    printf("  flags  = %08x\n", scp->sc_sl.sl_ss.ss_flags);
    printf("  pcoqh  = %08x\n", scp->sc_sl.sl_ss.ss_pcoq_head);
    printf("  pcoqt  = %08x\n", scp->sc_sl.sl_ss.ss_pcoq_tail);
    printf("  iir    = %08x\n", scp->sc_sl.sl_ss.ss_iir);
    printf("  isr    = %08x\n", scp->sc_sl.sl_ss.ss_isr);
    printf("  ior    = %08x\n", scp->sc_sl.sl_ss.ss_ior);
    printf("  fpstat = %08x\n", scp->sc_sl.sl_ss.ss_fpblock.fpint.ss_fpstat);

    /* Handle pending exceptions */
    for (i = 1; i <= 7; i++)
        handle_excp(&scp->sc_sl.sl_ss.ss_fpblock.fpint, i);

    /* Clear T bit in the floating-point status register */
    scp->sc_sl.sl_ss.ss_fpblock.fpint.ss_fpstat &= ~0x40;

    /* Re-enable the trap handler */
    signal(sig, fpe_handler);

handle_excp(fpint, n)
struct fp_int_block *fpint;
int n;
    int code, t;
    unsigned int *fr;

    /* Treat the floating-point register save area as an integer array */
    /* The exception registers are fr[1] through fr[7] */
    fr = &fpint->ss_fpstat;
    if (fr[n] == 0)

    printf("  excp%d  = %08x", n, fr[n]);
    code = fr[n] >> 26;
    if (code & 0x02) printf(" inexact");
    if (code & 0x08) printf(" overflow");
    else if (code & 0x04) printf(" underflow");
    else if (code & 0x20) printf(" invalid op");
    else if (code & 0x10) printf(" div by zero");
    else if (code & 0x01) printf(" unimplemented");

    /* For example purposes only: set target register to 0.0 */
    /* Warning!!!  This assumes a PA-RISC 1.0 double-precision instruction! */
    /* You should really check the sub-opcode in the exception register, */
    /* and tailor the action here to the excepting instruction */
    t = fr[n] & 0x1f;
    fr[t*2] = 0;
    fr[t*2+1] = 0;

    /* Clear exception register */
    fr[n] = 0;