6.3. Building an Optimized Scilab

One relatively easy way to to increase Scilab's performance is recompiling it with a good compiler and an optimized BLAS library[1].

Our experience only suffices to explain the compilation on IA32 GNU/Linux systems. Here, gcc or pgcc are the compilers of choice.

The following options are a good starting point for further exploration. They apply to compiling Fortran as well as C code.

-march=arch

This option instructs gcc to generate code specifically for architecture arch. Among other things it sets -mcpu=arch. Furthermore, it forces -malign-loops, -malign-jumps, -malign-functions, and -mpreferred-stack-boundary to their optimum values for the selected architecture without braking the ABI. Therefore, it can be considered an optimization switch.

-malign-double

For systems with an original Intel®[2] Pentium® or above processor this option is an absolute must. It forces the aligment of 64 bit floating point numbers (also known as double, double precision, and IEEE754) to a 64 bit boundary. Though it breaks the ABI, the gain in speed due to avoiding the misalignment penalty on each memory access is tremendous, even on PentiumPro® and later systems with all write back caches enabled.

Warning

-malign-double breaks the ABI!

Code using double compiled with [p]g++-2.95 and -malign-double is known to cause segmentation faults under some circumstances.

-O2

The workhorse optimization switch, -O2, activates a lot of optimizations. See node "Optimize Options" in gcc's info file, e.g. info -f /usr/info/gcc.info.gz -n "Optimize Options"

The optimizations toggled on by -O2 are well tested and do not produce excessively long text.

-funroll-all-loops

This switch increases the text size by unrolling as many loops as possible, thereby speeding them up. YMMV.

-fschedule-insns2

Although the gcc info page states that this optimization is switched on by -O2, this might not be true for all versions of gcc floating around. The switch should be particularly helpful on machines with a relatively small number of registers and where memory load instructions take more than one cycle.

Notes

[1]

Simply linking with an optimized BLAS library generally is not enough. Patches (e.g. "fast-blas", and "big patch") to fix part of this problem exist. Check out Hammersmith Consulting's Scilab patches page .

[2]

Intel®, Pentium®, and PentiumPro® are registered trademarks of Intel Corp.