### 19 Floats

Starting with version 4.5, GAP has built-in support for floating-point numbers in machine format, and allows package to implement arbitrary-precision floating-point arithmetic in a uniform manner. For now, one such package, Float exists, and is based on the arbitrary-precision routines in mpfr.

A word of caution: GAP deals primarily with algebraic objects, which can be represented exactly in a computer. Numerical imprecision means that floating-point numbers do not form a ring in the strict GAP sense, because addition is in general not associative ((1.0e-100+1.0)-1.0 is not the same as 1.0e-100+(1.0-1.0), in the default precision setting).

Most algorithms in GAP which require ring elements will therefore not be applicable to floating-point elements. In some cases, such a notion would not even make any sense (what is the greatest common divisor of two floating-point numbers?)

#### 19.1 A sample run

Floating-point numbers can be input into GAP in the standard floating-point notation:

gap> 3.14;
3.14
gap> last^2/6;
1.64327
gap> h := 6.62606896e-34;
6.62607e-34
gap> pi := 4*Atan(1.0);
3.14159
gap> hbar := h/(2*pi);
1.05457e-34


Floating-point numbers can also be created using Float, from strings or rational numbers; and can be converted back using String,Rat,Int.

GAP allows rational and floating-point numbers to be mixed in the elementary operations +,-,*,/. However, floating-point numbers and rational numbers may not be compared. Conversions are performed using the creator Float:

gap> Float("3.1416");
3.1416
gap> Float(355/113);
3.14159
gap> Rat(last);
355/113
gap> Rat(0.33333);
1/3
gap> Int(1.e10);
10000000000
gap> Int(1.e20);
100000000000000000000
gap> Int(1.e30);
1000000000000000019884624838656


#### 19.2 Methods

Floating-point numbers may be directly input, as in any usual mathematical software or language; with the exception that every floating-point number must contain a decimal digit. Therefore .1, .1e1, -.999 etc. are all valid GAP inputs.

Floating-point numbers so entered in GAP are stored as strings. They are converted to floating-point when they are first used. This means that, if the floating-point precision is increased, the constants are reevaluated to fit the new format.

Floating-point numbers may be followed by an underscore, as in 1._. This means that they are to be immediately converted to the current floating-point format. The underscore may be followed by a single letter, which specifies which format/precision to use. By default, GAP has a single floating-point handler, with fixed (53 bits) precision, and its format specifier is 'l' as in 1._l. Higher-precision floating-point computations is available via external packages; float for example.

A record, FLOAT (19.2-6), contains all relevant constants for the current floating-point format; see its documentation for details. Typical fields are FLOAT.MANT_DIG=53, the constant FLOAT.VIEW_DIG=6 specifying the number of digits to view, and FLOAT.PI for the constant π. The constants have the same name as their C counterparts, except for the missing initial DBL_ or M_.

Floating-point numbers may be created using the single function Float (19.2-7), which accepts as arguments rational, string, or floating-point numbers. Floating-point numbers may also be created, in any floating-point representation, using NewFloat (19.2-7) as in NewFloat(IsIEEE754FloatRep,355/113), by supplying the category filter of the desired new floating-point number; or using MakeFloat (19.2-7) as in NewFloat(1.0,355/113), by supplying a sample floating-point number.

Floating-point numbers may also be converted to other GAP formats using the usual commands Int (14.2-3), Rat (17.2-6), String (27.6-6).

Exact conversion to and from floating-point format may be done using external representations. The "external representation" of a floating-point number x is a pair [m,e] of integers, such that x=m*2^(1+e-LogInt(m,2)). Conversion to and from external representation is performed as usual using ExtRepOfObj (79.15-1) and ObjByExtRep (79.15-1):

gap> ExtRepOfObj(3.14);
[ 7070651414971679, 2 ]
gap> ObjByExtRep(IEEE754FloatsFamily,last);
3.14


Computations with floating-point numbers never raise any error. Division by zero is allowed, and produces a signed infinity. Illegal operations, such as 0./0., produce NaN's (not-a-number); this is the only floating-point number x such that not EqFloat(x+0.0,x).

The IEEE754 standard requires NaN to be non-equal to itself. On the other hand, GAP requires every object to be equal to itself. To respect the IEEE754 standard, the function EqFloat (19.2-2) should be used instead of =.

The category a floating-point belongs to can be checked using the filters IsFinite (30.4-2), IsPInfinity (19.2-5), IsNInfinity (19.2-5), IsXInfinity (19.2-5), IsNaN (19.2-5).

Comparisons between floating-point numbers and rationals are explicitly forbidden. The rationale is that objects belonging to different families should in general not be comparable in GAP. Floating-point numbers are also approximations of real numbers, and don't follow the same rules; consider for example, using the default GAP implementation of floating-point numbers,

gap> 1.0/3.0 = Float(1/3);
true
gap> (1.0/3.0)^5 = Float((1/3)^5);
false


##### 19.2-1 Mathematical operations
 ‣ Cos( x ) ( operation )
 ‣ Sin( x ) ( operation )
 ‣ SinCos( x ) ( operation )
 ‣ Tan( x ) ( operation )
 ‣ Sec( x ) ( operation )
 ‣ Csc( x ) ( operation )
 ‣ Cot( x ) ( operation )
 ‣ Asin( x ) ( operation )
 ‣ Acos( x ) ( operation )
 ‣ Atan( x ) ( operation )
 ‣ Atan2( y, x ) ( operation )
 ‣ Cosh( x ) ( operation )
 ‣ Sinh( x ) ( operation )
 ‣ Tanh( x ) ( operation )
 ‣ Sech( x ) ( operation )
 ‣ Csch( x ) ( operation )
 ‣ Coth( x ) ( operation )
 ‣ Asinh( x ) ( operation )
 ‣ Acosh( x ) ( operation )
 ‣ Atanh( x ) ( operation )
 ‣ Log( x ) ( operation )
 ‣ Log2( x ) ( operation )
 ‣ Log10( x ) ( operation )
 ‣ Log1p( x ) ( operation )
 ‣ Exp( x ) ( operation )
 ‣ Exp2( x ) ( operation )
 ‣ Exp10( x ) ( operation )
 ‣ Expm1( x ) ( operation )
 ‣ Cuberoot( x ) ( operation )
 ‣ Square( x ) ( operation )
 ‣ Hypothenuse( x, y ) ( operation )
 ‣ Ceil( x ) ( operation )
 ‣ Floor( x ) ( operation )
 ‣ Round( x ) ( operation )
 ‣ Trunc( x ) ( operation )
 ‣ Frac( x ) ( operation )
 ‣ SignFloat( x ) ( operation )
 ‣ Argument( x ) ( operation )
 ‣ Erf( x ) ( operation )
 ‣ Zeta( x ) ( operation )
 ‣ Gamma( x ) ( operation )
 ‣ ComplexI( x ) ( operation )

Usual mathematical functions.

##### 19.2-2 EqFloat
 ‣ EqFloat( x, y ) ( operation )

Returns: Whether the floateans x and y are equal

This function compares two floating-point numbers, and returns true if they are equal, and false otherwise; with the exception that NaN is always considered to be different from itself.

##### 19.2-3 PrecisionFloat
 ‣ PrecisionFloat( x ) ( operation )

Returns: The precision of x

This function returns the precision, counted in number of binary digits, of the floating-point number x.

##### 19.2-4 Interval operations
 ‣ Sup( interval ) ( operation )
 ‣ Inf( interval ) ( operation )
 ‣ Mid( interval ) ( operation )
 ‣ AbsoluteDiameter( interval ) ( operation )
 ‣ RelativeDiameter( interval ) ( operation )
 ‣ Overlaps( interval1, interval2 ) ( operation )
 ‣ IsDisjoint( interval1, interval2 ) ( operation )
 ‣ IncreaseInterval( interval, delta ) ( operation )
 ‣ BlowupInterval( interval, ratio ) ( operation )
 ‣ BisectInterval( interval ) ( operation )

Most are self-explanatory. BlowupInterval returns an interval with same midpoint but relative diameter increased by ratio; IncreaseInterval returns an interval with same midpoint but absolute diameter increased by delta; BisectInterval returns a list of two intervals whose union equals interval.

##### 19.2-5 IsPInfinity
 ‣ IsPInfinity( x ) ( property )
 ‣ IsNInfinity( x ) ( property )
 ‣ IsXInfinity( x ) ( property )
 ‣ IsFinite( x ) ( property )
 ‣ IsNaN( x ) ( property )

Returns true if the floating-point number x is respectively +∞, -∞, ±∞, finite, or not a number', such as the result of 0.0/0.0.

##### 19.2-6 FLOAT
 ‣ FLOAT ( global variable )

This record contains useful floating-point constants:

DECIMAL_DIG

Maximal number of useful digits;

DIG

Number of significant digits;

VIEW_DIG

Number of digits to print in short view;

EPSILON

Smallest number such that 1≠1+ϵ;

MANT_DIG

Number of bits in the mantissa;

MAX

Maximal representable number;

MAX_10_EXP

Maximal decimal exponent;

MAX_EXP

Maximal binary exponent;

MIN

Minimal positive representable number;

MIN_10_EXP

Minimal decimal exponent;

MIN_EXP

Minimal exponent;

INFINITY

Positive infinity;

NINFINITY

Negative infinity;

NAN

Not-a-number,

as well as mathematical constants E, LOG2E, LOG10E, LN2, LN10, PI, PI_2, PI_4, 1_PI, 2_PI, 2_SQRTPI, SQRT2, SQRT1_2.

##### 19.2-7 Float
 ‣ Float( obj ) ( operation )
 ‣ NewFloat( filter, obj ) ( operation )
 ‣ MakeFloat( sample, obj, obj ) ( operation )

Returns: A new floating-point number, based on obj

This function creates a new floating-point number.

If obj is a rational number, the created number is created with sufficient precision so that the number can (usually) be converted back to the original number (see Rat (Reference: Rat) and Rat (17.2-6)). For an integer, the precision, if unspecified, is chosen sufficient so that Int(Float(obj))=obj always holds, but at least 64 bits.

obj may also be a string, which may be of the form "3.14e0" or ".314e1" or ".314@1" etc.

An option may be passed to specify, it bits, a desired precision. The format is Float("3.14":PrecisionFloat:=1000) to create a 1000-bit approximation of 3.14.

In particular, if obj is already a floating-point number, then Float(obj:PrecisionFloat:=prec) creates a copy of obj with a new precision. prec

##### 19.2-8 Rat
 ‣ Rat( f ) ( operation )

Returns: A rational approximation to f

This command constructs a rational approximation to the floating-point number f. Of course, it is not guaranteed to return the original rational number f was created from, though it returns the most reasonable' one given the precision of f.

If used in the form Rat(f:maxdenom:=max), the rational returned is the first one with denominator at most max.

##### 19.2-9 SetFloats
 ‣ SetFloats( rec[, bits][, install] ) ( function )

Installs a new interface to floating-point numbers in GAP, optionally with a desired precision bits in binary digits. The last optional argument install is a boolean value; if false, it only installs the eager handler and the precision for the floateans, without making them the default.

#### 19.3 High-precision-specific methods

GAP provides a mechanism for packages to implement new floating-point numerical interfaces. The following describes that mechanism, actual examples of packages are documented separately.

A package must create a record with fields (all optional)

creator

a function converting strings to floating-point;

eager

a character allowing immediate conversion to floating-point;

objbyextrep

a function creating a floating-point number out of a list [mantissa,exponent];

filter

a filter for the new floating-point objects;

constants

a record containing numerical constants, such as MANT_DIG, MAX, MIN, NAN.

The package must install methods Int, Rat, String for its objects, and creators NewFloat(filter,IsRat), NewFloat(IsString).

It must then install methods for all arithmetic and numerical operations: PLUS, Exp, ...

The user chooses that implementation by calling SetFloats (19.2-9) with the record as argument, and with an optional second argument requesting a precision in binary digits.

#### 19.4 Complex arithmetic

Complex arithmetic may be implemented in packages, and is present in float. Complex numbers are treated as usual numbers; they may be input with an extra "i" as in -0.5+0.866i.

Methods should then be implemented for Norm, RealPart, ImaginaryPart, ComplexConjugate, ...

#### 19.5 Interval-specific methods

Interval arithmetic may also be implemented in packages. Intervals are in fact efficient implementations of sets of real numbers. The only non-trivial issue is how they should be compared. The standard EQ tests if the intervals are equal; however, it is usually more useful to know if intervals overlap, or are disjoint, or are contained in each other. The methods provided by the package should include Sup,Inf,Mid,DiameterOfInterval,Overlaps,IsSubset,IsDisjoint.

Note the usual convention that intervals are compared as in [a,b]le[c,d] if and only if ale c and ble d.

generated by GAPDoc2HTML