Scilab Bag Of Tricks: The Scilab-2.5.x IAQ (Infrequently Asked Questions) | ||
---|---|---|
Prev | Chapter 4. Unknown Spots | Next |
Integer types were introduced in Scilab-2.5 (official release); they are an important concept, but to date their support still is incomplete and partially buggy. In many situations the use of integer variables can provide dramatic storage improvements; moreover, large problems, for example those occurring in image manipulation, often fit the hardware constraints when integer storage is exploited. Thus, even though the integer types in Scilab still leave something to be desired, their use may be a matter of necessity; and even considering that integer support is largely broken, yet, the existing possibilities can provide workable solutions. The following section is a guide to what is available and what is not when it comes to integer expressions.
The following spots are – to our opinion – missing parts in the current implementation of integers.
Integer constants can only be defined as results of an intN function (N = 8, 16, or 32) with a real argument. No special notation exists for integer literals as for example 123# or !123. Variables are declared as integral when they are assigned an integer value. The integer value has to be produced first, and this is only possible with a function.
This is inconvenient, and often also performance critical, for instance when defining large integer arrays. The requirement of duplicate storage for passing by value and the calling overhead can be demanding. For example,
ia = int8(modulo(1:1e6, 16))
produces the array ia that occupies 1 MB of RAM. Even thought, the definition procedure requires an intermediate storage of 24 MB (IEEE 754 double-precision reals have a size of 8 bytes each): 8 MB go for defining 1:1e6 and 16 MB for passing by value the result to modulo and to some other internals of modulo. Scilab goes on a detour in the construction of integral variables instead of attacking this area directly: the parser ought to recognize terminal symbols making up integral expressions, so that no double-precision intermediate result is called into play. The pitfall lies partly in the missing notation itself, and partly in the need to do the integer conversion only at the last step of the evaluation, for lack of usable integer constructs (see Section 4.5.2.1).
Altogether, the introduction of integers has brought 6 new data types:
int8,
uint8,
int16,
uint16,
int32, and
uint32.
Scilab generally does implicit type conversions in expressions involving reals, booleans, and several other types, but not when at least one of the operands is an integer. Automatic conversions – for example, the result of an addition of an int8 and an int16 becomes an int16, an int16 plus a real makes a real, etc. – are not implemented. In some programming languages, strong typing can be a design decision; here, it is probably just a lack. The only automatic conversion takes place when assigning a real value to elements of an array, which has been predefined as integral. Then, the right hand side is silently cast to the left hand side's type.
-->a = int8(zeros(1, 8)); -->a(2:4) = 5.3 a = !0 5 5 5 0 0 0 0 !
In addition to the lack of automatic type conversion, a few bugs involving mixed type expressions are exemplified below (see Section 4.5.2.3.2).
Indexing of array elements is a classical use for integers. However, Scilab solely supports double-precision, and not integer-typed indices for arrays and hyper-arrays. In many situations juggling integer indices would be more memory efficient than dealing with double precision. The double-precision indices finally have to be (internally) cast to integers to actually index into an array. Consider for example
a = rand(1, 1e6) a = a(1e6:-1:1)
The reversal of the elements of the array requires 24 MB: 8 MB for storing a, 8 MB for storing the right hand side of the assignment, and 8 MB for storing the index expression 1e6:-1:1. If int32s were used instead, half a megabyte could be saved.
Indexing Done Right
If indexing were done right, it would not require any additional core. The stride of -1 magically turns your index expression into a efficient call to dcopy.
Only a small subset of the functions which work on reals, or of the syntactical constructs which involve reals, can be applied to integers. Which functions support integers and which not do not, does not follow a rule – it just looks like unfinished work. A practical account is given in Section 4.5.2.1.
Reading and writing of integer data from and to data files is still imperfect. As for reading: in Scilab-2.5 (official release), values retrieved with mget or read from external data files always are rendered as double precision reals. Only afterwards they can be converted to integers. This once more carries the disadvantage of the real (no pun intended) detour, as discussed in the previous section. An external datafile containing many short integers might not be loadable, because the data are expanded to double precision reals, filling the available memory, though, once reconverted to short integers, the data would fit. From Scilab-2.5.1 (alpha version) on, function mgeti exists, and is well suited for integers stored in binary files, but no integer equivalent of read yet exists.
As for the complementary operation, writing integers into a binary file, function mput(data, type) has been present before Scilab-2.5 (official release). There, however, mput accepted only real data, even though data could be written into the file as an integer of any type, if specified. Only from Scilab-2.5.1 (alpha version) on, it has become possible to pass integers to mput. Actually, in Scilab-2.5.1 (alpha version) there were still a couple of bugs lying around: when integer data was output, extensive garbage was printed, and explicit reference of the unit number was impossible. So, in Scilab-2.5.1 (alpha version):
-->fd = mopen('my_file', 'wb'); -->mput(int8(1:1000), 's', fd); !--error 201 : argument 3 should be a real or complex matrix -->mclose(fd);
while
-->fd = mopen('my_file', 'wb'); -->mput(int8(1:1000), 's'); -->mclose(fd);
worked, but printed a lot of output to the console, considerably slowing down the computation. Both of these bugs are ironed out in Scilab-2.6 (official release).
In contrast, load seamlessly retrieves integer values, if the corresponding save wrote them as such.
With integers, some Scilab constructs work, some simply do not, and others apparently work, but incorrectly, and are thus best avoided. If a user is forced to use integers, she needs a road map to what is viable and where to stay away from. The following considerations can help in surviving with integer data.
Plainly, some Scilab functions work as expected with integer arguments, and some do not. In many cases this seems a matter of lazily done homework or homework not done at all. The proper overloading alternatives to the real constructs are missing! We cannot give any general rules, except for these two:
Functions that can give a real or complex result with an integer input, for example sqrt or spec, in most of the cases do not accept integer types.
It is naive to expect any function or expression which relies on indices or index counts to work with integer enumerators.
Table 4-8. Selected Functions and Operators That Work With Integers
Operator or Function | Comment |
---|---|
"+", "-", "*", "/", "^", and "'" | their dotted cousins are also working |
":" | colon operator used as implicit or explicit indexer of integer arrays, with real indices. For example, i1 = int8(1:10); i1(:) is accepted. |
min(ival), max(ival), matrix(ival), hypermatrix(zval, ival), | returning integer values ival of the same type as their arguments; zval is real or complex. |
size | real result! |
eye(ival), ones(ival), zeros(ival) | real result! |
eye(ival), cumsum(ival), sum(ival), | integer result |
disp(ival) | string result |
fft(ival) | complex result |
In this section, "not work" means that Scilab complains with an error, usually about a wrong argument type or a missing overload function.
Table 4-9. Selected Functions and Operators That Do Not Work With Integers
Operator or Function | Comment |
---|---|
":" | colon operator used as binary or ternary range generator |
length(ival), mean(ival) | |
eye(ival1, ival2), ones(ival1, ival2), zeros(ival1, ival2) | where ival1 and ival2 are integer variables |
sqrt(ival) | |
cumprod(ival), prod(ival) | |
ceil(ival), floor(ival), int(ival), modulo(ival) | These are all real-to-real functions! |
gsort(ival), lex_sort(ival), sort(ival) |
Sparse integer matrices are not supported at all.
Unsigned integer expressions cannot overflow; in particular, no warnings are issued. The result of an expression involving unsigned integers is always computed with respect to the modulus of the type.
-->uint8(129) + uint8(129) ans = 2 -->int16(32769) ans = -32767
This is not surprising, but has to be kept in mind when doing integer calculations.
Background Information
On our days hardware, integer arithmetic is almost always done modulo 2**width, where width is the number of bits (typically 32 or 64) to represent an integer as a two's complement. This behavior kind of "leaks through" from the central processing unit (CPU), where neither integer overflow nor underflow exists. The main reason for implementing modular integers is speed. Implementing integers as range-checked type would incur a vast overhead and massively hurt performance.
However, integer divisions by integer zero are trapped, even when setting ieee(2):
-->ieee(2); -->int8(4) / int8(0) !--error 27 division by zero...
Incidentally, intN(%nan), intN(-%inf) and intN(%inf), where N is 8, 16, or 32 all return 0. The same holds for all uint functions.
No overflow or underflow warning is reported either, if a longer integer is converted to a shorter one, whereas no loss of precision ever occurs when any kind of integer is cast to a real, because real mantissas (also known as significants) are represented by more bits – 52 to be precise – than any Scilab integer type.
Integer types are a nice idea, and were definitely missing to Scilab before Scilab-2.5 (official release), but this said, we regretfully continue with our list of bugs. Unfortunately, it is not just a matter of implemented versus unimplemented integer constructs. Even some seemingly working constructs are problematic. Short of discouraging the use of integers types altogether, we go on reporting some troublesome spots, hoping that they will be addressed in future releases. We point out alternatives where appropriate.
In Scilab-2.5 (official release), there were serious bugs, which gave rise to wrong results even in the simplest concatenations of integer arrays. For instance,
-->[uint16(1), uint16(2)] ans = !1 0 ! -->[ans, ans] ans = !1 0 2 0 !
Similar things happen with int8 and uint8, but not with int32 and uint32. These bugs appear to have been corrected in subsequent versions of Scilab-2.5 (official release).
Here, anything can happen, depending on the context and on the Scilab version. Most of the time, overloading functions (see also Section 4.2) for operators that involve two different types are undefined. Consequently, errors result from calling them. In several cases, however, wrong results show up.
-->int16(10) .* 3.2 !--error 4 undefined variable : %i_x_s
The proper overloading function, integer-times-real %i_x_s, for the ".*" operator is missing, and this is reported as an error. If, however, the user enters
-->int16(10) * 3.2 ans = 30
in Scilab-2.5 (official release), while
-->int16(10) * 3.2 ans = -32678
in Scilab-2.5.1 (second beta version), and
-->int16(10) * 3.2 ans = 4
in Scilab-2.6 (official release).
Among the numerical operators, "^" is a little more sophisticated. Mixed power operations are often correct, they also retain the type of the integer operand for positive integer exponents, while they give a real or complex result if the exponent is negative or non-integral. Thus,
-->int16(2)^(-4) ans = 0.0625 -->int16(4)^(1/2) -- exponent is real! ans = 2. -->4.0^int16(2) ans = 16 -->typeof(int8(4)^int16(2)) ans = int8 -->int16(-4)^0.5 ans = 1.225E-16 + 2.i
All is OK here? Well, not all doughnuts come out with a hole.
-->uint16(-4)^0.5 ans = 255.99219
What about booleans? When booleans enter the game, the standard behavior in mixed real boolean expressions is to treat %t as 1.0 and %f as 0.0 (see also Section 4.4).
-->%t + 1.0 ans = 2.
Not so with integers! Most of the time the user again runs into missing overloading functions. In Scilab-2.5 (official release), however, the door was open to further bugs and oddities, which have been addressed in the later versions. For instance, operations with int8's were accepted, but the results did depend on the order of the operands.
-->%t + int8(1) ans = T -->int8(1) + %t ans = 1 -->int8(1) + %f !--error 4 undefined variable : %i_a_b
Other integer types triggered undefined overload functions, reporting errors. Fortunately that is what happens in any case from Scilab-2.5.1 (alpha version) on. Moreover, sneaking through the definition holes of Scilab-2.5 (official release), the game went on with even stranger results, which changed after each call. The following example was reported by Tom Bruhns <tom_bruhns@agilent.com>.
-->f1 = %t + int8(0:20) f1 = ! T T T T T T T T T T T T T T T T T T T T T ! -->f2 = %t + int8(0:20) f2 = ! T T T T T T T T T T T T T T T T T T T T T ! -->f1 == f2 ans = ! T T T T T T T F F F F F F F F F F T F T F !
Oh, maybe it was my imagination that f1 == f2 did not make all "T" results ...
-->f1 == f2 ans = ! T T T T T T T F F F F F F F F F F F F F F !
What, not even the same answer as one lines ago? Ouch! Does this build confidence, or what?
Upshot. Avoid mixed typed expressions like the plague, at least for the moment; avoid them harder if you are still using Scilab-2.5 (official release). Peruse the double, intN, and uintN converters (or iconvert) as often as needed.
Up to the latest Scilab-2.6 (official release), comparisons between values of different types (doubles, integers) are allowed. However, the results are not always consistent. This is yet another example of mixed type expressions, now with relational operators. For instance, comparing a real scalar or vector with a real scalar is valid.
-->(1:2) > 1 ans = ! F T !
This is the standard behavior. Trying to do with integers, you enter dangerous grounds. Comparing scalar integers of the same type is safe.
-->int16(9) > int16(8) ans = T -->int16(9) < int16(8) ans = F
Sometimes even comparing different types yields correct results, as, for example,
-->int16(1:2) > int32(1) ans = ! F T ! -->int16(2) > 1 ans = T
This would suggest that some sort of type conversion takes place before the comparison, however, up to Scilab-2.5.1 (second beta version) this impression is wrong.
-->int16(1:2) > 1 ans = ! F F !
To put it another way, maybe this result is correct, as both int16(1) and int16(2) appearing on the left hand side are different from 1, which is a double precision real value! But this latter interpretation is inconsistent with the two examples above, which is disturbing. In Scilab-2.6 (official release),
-->int16(1:2) > 1 ans = ! T T !
which is different, wrong, and not even amenable to the previous interpretation.
Similarly, consider the different behavior of a (meaningless) comparison of real and complex.
-->%i > 1 !--error 4 undefined variable : %s_2_s
Fine! Now comparing an integer with a complex does neither produce an error, nor a correct result:
-->%i > int16(1) ans = F -->%i < int16(1) ans = T -->%i == int16(0) ans = T -->-%i == int16(127) ans = T
with small differences depending on the actual Scilab version.
Here too, bugs are lurking under the surface. In principle an array can be compared to a scalar, resulting in a boolean array of the size of the former. When doing that with integers of the same type, the results can be wrong, in a way which strangely seems to be more related to indexing than to comparison.
-->ia = int16(1:20); -->ia > int16(21) ans = ! F F F F F F F F F F F F F F F F F F T T !
The last two entries of the boolean result are plain wrong, even though inspection proves that the corresponding elements of ia are correct. If instead, the elements of ia are explicitely referenced,
-->ia(1:20) > int16(21) ans = ! F F F F F F F F F F F F F F F F F F F F !
the answer is correct. On the other hand, wrong results are also returned by expressions as ia(:) > int16(21), ia(1:$) > int16(21) and int16(21) < ia(1:20).
Upshot. It seems that the only relational expressions one can really trust are either int_array relop int_scalar, with identically-typed operands and explicit reference to the array elements, or int_array relop int_array, with equally sized and identically-typed arrays.
On GNU/Linux PPC, we found that the range of type int8 is identical to that of uint8; both assume values from 0 to 255.
-->int8(-1) ans = 255
However, Scilab regards the two as different types, and refuses to evaluate expressions involving both of them.
-->int8(1) + uint8(1) !--error 4 undefined variable : %i_a_i
To conclude with something functional: the operators "~", "&", and "|" can be used in integer expressions. In this case, they act on the single bits of the representation of the integer value.
-->~uint16(1) ans = 65534 -->~int16(1) ans = -2 -->int16(1) | int16(4) ans = 5
Bitwise and/or of two different integral types is not possible.
Bitwise operations are a bonus, when programming hardware at the register level. This is a case often encountered in interfacing with external instruments such as data acquisition cards.
To print integer values as hexadecimal strings, function dec2hex exists. Though, funnily, dec2hex is meant to accept reals as its only arguments. As previously mentioned for integer constants, no special notation exists for hexadecimal values; the function dec2hex, and its dual hex2dec, are mere formatting functions.
An alternative approach to bitwise operations, that might allow greater flexibility than intN operations, is the following. Binary strings can be represented (wasting memory) by boolean arrays. For example, for 8 bit strings, to fix the idea:
b8 = [%t %f %t %t %f %f %f %f] // for 10110000
First define a suitable vector with the powers of two.
pow2 = 2^(7:-1:0)
which is used in boolean to integer conversion
s = sum(pow2(b8))
and integer to boolean conversion
d2b = zeros(1, 8) for i = 1:8 d2b(i) = int((s - d2b*pow2) / pow2(i)) end b8 = d2b==1
Logical "and" and "or" operations map onto the usual logical expressions
c8 = a8 & b8 d8 = a8 | b8 // etc.
and even bit shifts can be written clearly with vectors.
e8 = b8([2:8, 1]) f8 = b8([8, 1:7])
Such an approach has some advantages and some disadvantages. The main advantage is direct access to a single bit, while the disadvantage is the larger memory consumption, the use of an extra array dimension, and the need of time consuming boolean to integer conversion functions.