Scilab Bag Of Tricks: The Scilab-2.5.x IAQ (Infrequently Asked Questions) | ||
---|---|---|
Prev | Chapter 6. Performance | Next |
One relatively easy way to to increase Scilab's performance is recompiling it with a good compiler and an optimized BLAS library[1].
Our experience only suffices to explain the compilation on IA32 GNU/Linux systems. Here, gcc or pgcc are the compilers of choice.
The following options are a good starting point for further exploration. They apply to compiling Fortran as well as C code.
This option instructs gcc to generate code specifically for architecture arch. Among other things it sets -mcpu=arch. Furthermore, it forces -malign-loops, -malign-jumps, -malign-functions, and -mpreferred-stack-boundary to their optimum values for the selected architecture without braking the ABI. Therefore, it can be considered an optimization switch.
For systems with an original Intel®[2] Pentium® or above processor this option is an absolute must. It forces the aligment of 64 bit floating point numbers (also known as double, double precision, and IEEE754) to a 64 bit boundary. Though it breaks the ABI, the gain in speed due to avoiding the misalignment penalty on each memory access is tremendous, even on PentiumPro® and later systems with all write back caches enabled.
![]() |
-malign-double breaks the ABI! Code using double compiled with [p]g++-2.95 and -malign-double is known to cause segmentation faults under some circumstances. |
The workhorse optimization switch, -O2, activates a lot of optimizations. See node "Optimize Options" in gcc's info file, e.g. info -f /usr/info/gcc.info.gz -n "Optimize Options"
The optimizations toggled on by -O2 are well tested and do not produce excessively long text.
This switch increases the text size by unrolling as many loops as possible, thereby speeding them up. YMMV.
Although the gcc info page states that this optimization is switched on by -O2, this might not be true for all versions of gcc floating around. The switch should be particularly helpful on machines with a relatively small number of registers and where memory load instructions take more than one cycle.
[1] |
Simply linking with an optimized BLAS library generally is not enough. Patches (e.g. "fast-blas", and "big patch") to fix part of this problem exist. Check out Hammersmith Consulting's Scilab patches page . |
[2] |
Intel®, Pentium®, and PentiumPro® are registered trademarks of Intel Corp. |