GAP (cvec) - Chapter 3: The Data Structures

This chapter describes all the data structures used in this package. We start with a section on finite fields and what is already there in the GAP kernel and library. Then we describe compressed vectors and compressed matrices.

3.1 Finite field elements

Throughout the package, elements in the field GF(p) of p elements are represented by numbers 0,1,...,p-1 and arithmetic is just the standard arithmetic in the integers modulo p.

Bigger finite fields are done by calculating in the polynomial ring GF(p)[x] in one indeterminate x modulo a certain irreducible polynomial. By convention, we use the so-called "Conway polynomials" (see http://www.math.rwth-aachen.de:8001/~Frank.Luebeck/data/ConwayPol/index.html) for this purpose, because they provide a standard way of embedding finite fields into their extension fields. Because Conway polynomials are monic, we can store coset representatives by storing polynomials of degree less than d, where d is the degree of the finite field over its prime field.

As of this writing, GAP has two implementation of finite field elements built into its kernel and library, the first of which stores finite field elements by storing the discrete logarithm with respect to a primitive root of the field. Although this is nice and efficient for small finite fields, it becomes unhandy for larger finite fields, because one needs a lookup table of length p^d for doing efficient addition. This implementation thus is limited to fields with less than or equal to 65536 elements. The other implementation using polynomial arithmetic modulo the Conway polynomial is used for fields with more than 65536 elements. For prime fields of characteristic p with more than that elements, there is an implementation using integers modulo p.

3.2 Compressed Vectors in Memory

3.2-1 Packing of prime field elements

For this section, we assume that the base field is GF(p^d), the finite field with p^d elements, where p is a prime and d is a positive integer. This is realized as a field extension of degree d over the prime field GF(p) using the Conway polynomial c ∈ GF(p)[x]. We always represent field elements of GF(p^d) by polynomials a = \sum_{{i=0}}^{{d-1}} a_i x^i where the coefficients a_i are in GF(p). Because c is monic, this gives a one-to-one correspondence between polynomials and finite field elements.

The memory layout for compressed vectors is determined by two important constants, depending only on p and the word length of the machine. The word length is 4 bytes on 32bit machines (for example on the i386 architecture) and 8 bytes on 64bit machines (for example on the AMD64 architecture). More concretely, a "Word" is an unsigned long int in C and the length of a Word is sizeof(unsigned long int).

Those constants are bitsperel (bits per prime field element) and elsperword (prime field elements per Word). bitsperel is 1 for p=2 and otherwise the smallest integer, such that 2^bitsperel > 2⋅ p-1. This means, that we can store the numbers from 0 to 2⋅ p - 1 in bitsperel bits. elsperword is 32/bitsperel rounded down to the next integer and multiplied by 2 if the length of a Word is 8 bytes. Note that we thus store as many prime field elements as possible into one Word, however, on 64bit machines we store only twice as much as would fit into 32bit, even if we could pack one more into a Word. This has technical reasons to make conversion between different architectures more efficient.

These definitions imply that we can put elsperword prime field elements into one Word. We do this by using the bitsperel least significant bits in the Word for the first prime field element, using the next bitsperel bits for the next prime field element and so on. Here is an example that shows how the 6 finite field elements 0,1,2,3,4,5 of GF(11) are stored in that order in one 32bit Word. bitsperel is here 4, because 2^4 < 2⋅ 11 - 1 = 21 < 2^5. Therefore elsperword is 5 on a 32bit machine.

Here is another example that shows how the 20 finite field elements 0,1,2,0,0,0,1,1,1,2,2,2,0,1,2,2,1,0,2,2 of GF(3) are stored in that order in one 64bit Word. bitsperel is here 3, because 2^2 < 2⋅ 3 - 1 = 5 < 2^3. Therefore elsperword is 20 on a 32bit machine. Remember, that we only put twice as many elements in one 64bit Word than we could in one 32bit Word!

Note that different architectures store their Words in a different byte order in memory ("endianess"). We do not specify how the data is distributed into bytes here! All access is via Words and their arithmetic (shifting, addition, multiplication, etc.). See Section 3.4 for a discussion of this with respect to our external representation.

3.2-2 Extension Fields

Now that we have seen how prime field elements are packed into Words, we proceed to the description how compressed vectors are stored over arbitrary extension fields:

Assume a compressed vector has length l with l ≥ 0. If d=1 (prime field), it just uses elsperword/l Words (division rounded up to the next integer), where the first Word stores the leftmost elsperword field elements in the first Word as described above and so on. This means, that the very first field element is stored in the least significant bits of the first Word.

In the extension field case GF(p^d), a vector of length l uses (elsperword/l)*d Words (division rounded up to the next integer), where the first d Words store the leftmost elsperword field elements. The very first word stores all the constant coefficients of the polynomials representing the first elsperword field elements in their order from left to right, the second Word stores the coefficients of x^1 and so on until the d'th Word, which stores the coefficients of x^{d-1}. Unused entries behind the end of the actual vector data within the last Word has to be zero!.

The following example shows, how the 9 field elements x^2+x+1, x^2+2x+2, x^2+3x+3, x^2+4x+4, 2x^2+x, 2x^2+3x+1, 2x^2+4x+2, 3x^2+1, and 4x^2+x+3 of GF(5^3) occurring in a vector of length 9 in that order are stored on a 32bit machine:

 ^^^ lower memory addresses ^^^
            ....|....|....|....|....|....|....|....  (least significant bit)
 1st Word:  0001|0010|0001|0000|0100|0011|0010|0001| (first
 2nd Word:  0000|0100|0011|0001|0100|0011|0010|0001|     8 field
 3rd Word:  0011|0010|0010|0010|0001|0001|0001|0001|        elements)
 ---------------------------------------------------
 4th Word:  0000|0000|0000|0000|0000|0000|0000|0011| (second
 5th Word:  0000|0000|0000|0000|0000|0000|0000|0001|     8 field
 6th Word:  0000|0000|0000|0000|0000|0000|0000|0100|        elements)
 VVV higher memory addresses VVV

A "cvec" (one of our compressed vectors) is a GAP "Data object" (that is with TNUM equal to T_DATOBJ). The first machine word in its bag is a pointer to its type, from the second machine word on the Words containing the above data are stored. The bag is exactly long enough to hold the type pointer and the data Words.

3.2-3 How is information about the base field stored?

But how does the system know, over which field the vector is and how long it is? The type of a GAP object consists of 3 pieces: A family, a bit list (for the filters), and one GAP object for "defining data". The additional information about the vector is stored in the third piece, the defining data, and is called a "cvecclass".

**Table:** Components of a `cvecclass`
Position	Name	Content
1	`IDX_fieldinfo`	a `fieldinfo` object, see below
2	`IDX_len`	the length of the vector as immediate GAP integer
3	`IDX_wordlen`	the number of `Word`s used as immediate GAP integer
4	`IDX_type`	a GAP type used to create new vectors
5	`IDX_GF`	a GAP object for the base field
6	`IDX_lens`	a list holding lengths of vectors in `cvecclasses` for the same field
7	`IDX_classes`	a list holding `cvecclass`es for the same field with lengths as in entry number 6

In position 5 we have just the GAP finite field object GF(p,d). The names appear as symbols in the code.

The field is described by the GAP object in position 1, a so-called fieldinfo object, which is described in the following table:

**Table:** Components of a `fieldinfo`
Position	Name	Content
1	`IDX_p`	p as an immediate GAP integer
2	`IDX_d`	d as an immediate GAP integer
3	`IDX_q`	q=p^d as a GAP integer
4	`IDX_conway`	a GAP string containing the coefficients of the Conway Polynomial as unsigned int []
5	`IDX_bitsperel`	number of bits per element of the prime field (`bitsperel`)
6	`IDX_elsperword`	prime field elements per `Word` (`elsperword`)
7	`IDX_wordinfo`	a GAP string containing C data for internal use
8	`IDX_bestgrease`	the best grease level (see Section 5.8)
9	`IDX_greasetabl`	the length of a grease table using best grease
10	`IDX_filts`	a filter list for the creation of new vectors over this field
11	`IDX_tab1`	a table for GF(q) to `[0..q-1]` conversion if q ≤ 65536
12	`IDX_tab2`	a table for `[0..q-1]` to GF(q) conversion if q ≤ 65536
13	`IDX_size`	0 for q ≤ 65536, otherwise 1 if q is a GAP immediate integer and 2 if not
14	`IDX_scafam`	the scalars family

Note that GAP has a family for all finite field elements of a given characteristic p, vectors over a finite field are then in the collections family of that family and matrices are in the collections family of the collections family of the scalars family. We use the same families in the same way for compressed vectors and matrices.

3.2-4 Limits that follow from the Data Format

The following limits come from the above mentioned data format or other internal restrictions (an "immediate integer" in GAP can take values between 2^{-28} and (2^{28})-1 inclusively on 32bit machines and between 2^{-60} and (2^{60})-1 on 64bit machines):

3.3 Compressed Matrices

The implementation of cmats (compressed matrices) is done mainly on the GAP level without using too many C parts. Only the time critical parts for some operations for matrices are done in the kernel.

**Table:** Components of a `cmat` object
Component name	Content
`len`	the number of rows, can be 0
`vecclass`	a `cvecclass` object describing the class of rows
`scaclass`	a `cscaclass` object holding a reference to `GF(p,d)`
`rows`	a list containing the rows of the matrix, which are `cvec`s
`greasehint`	the recommended greasing level

The cvecclass in the component vecclass determines the number of columns of the matrix by the length of the rows.

The length of the list in component rows is len+1, because the first position is equal to the integer 0 such that the type of the list rows can always be computed efficiently. The rows are then stored in positions 2 to len+1.

The component greasehint contains the greasing level for the next matrix multiplication, in which this matrix occurs as the factor on the right hand side (if greasing is worth the effort, see Section 5.8).

A cmat can be "pre-greased", which just means, that a certain number of linear combinations of its rows is already precomputed (see Section 5.8). In that case, the object is in the additional filter HasGreaseTab and the following components are bound additionally:

**Table:** Additional components of a `cmat` object when pre-greased
Component name	Content
`greaselev`	the grease level
`greasetab`	the grease table, a list of lists of `cvecs`
`greaseblo`	the number of greasing blocks
`spreadtab`	a lookup table for indexing the right linear combination

3.4 External Representation of Matrices on Storage

3.4-1 Byte ordering and word length

This section describes the external representation of matrices. A special data format is needed here, because of differences between computer architectures with respect to word length (32bit/64bit) and endianess. The term "endianess" refers to the fact, that different architectures store their long words in a different way in memory, namely they order the bytes that together make up a long word in different orders.

The external representation is the same as the internal format of a 32bit machine with little endianess, which means, that the lower significant bytes of a Word are stored in lower addresses. The reasons for this decision are firstly that 64bit machines can do the bit shifting to convert between internal and external representation easier using their wide registers, and secondly, that the nowadays most popular architectures i386 and AMD64 use both little endianess, such that conversion is only necessary on a minority of machines.

3.4-2 The header of a cmat file

The header of a cmat file consists of 5 words of 64bit each, that are stored in little endian byte order:

**Table:** Header of a `cmat` file
the magic value "`GAPCMat1`" as ASCII letters (8 bytes) in this order
the value of p to describe the base field
the value of d to describe the base field
the number of rows of the matrix
the number of columns of the matrix

After these 40 bytes follow the data words as described above using little endian 32bit Words as in the internal representation on 32bit machines.

Note that the decision to put not more than twice as many prime field elements into a 64bit Word than would fit into a 32bit Word makes the conversion between internal and external representation much easier to implement.

3 The Data Structures

3.1 Finite field elements

3.2 Compressed Vectors in Memory

3.2-1 Packing of prime field elements

3.2-2 Extension Fields

3.2-3 How is information about the base field stored?

3.2-4 Limits that follow from the Data Format

3.3 Compressed Matrices

3.4 External Representation of Matrices on Storage

3.4-1 Byte ordering and word length

3.4-2 The header of a `cmat` file

3 The Data Structures

3.1 Finite field elements

3.2 Compressed Vectors in Memory

3.2-1 Packing of prime field elements

3.2-2 Extension Fields

3.2-3 How is information about the base field stored?

3.2-4 Limits that follow from the Data Format

3.3 Compressed Matrices

3.4 External Representation of Matrices on Storage

3.4-1 Byte ordering and word length

3.4-2 The header of a cmat file

3.4-2 The header of a `cmat` file