GAP (ref) - Chapter 28: Dictionaries and General Hash Tables

People and computers spend a large amount of time with searching. Dictionaries are an abstract data structure which facilitates searching for objects. Depending on the kind of objects the implementation will use a variety of possible internal storage methods that will aim to provide the fastest possible access to objects. These internal methods include

28.1 Using Dictionaries

The standard way to use dictionaries is to first create a dictionary (using NewDictionary (28.2-1), and then to store objects (and associated information) in it and look them up.

For the creation of objects the user has to make a few choices: Is the dictionary only to be used to check whether objects are known already, or whether associated information is to be stored with the objects. This second case is called a lookup dictionary and is selected by the second parameter of NewDictionary (28.2-1).

The second choice is to indicate which kind of objects are to be stored. This choice will decide the internal storage used. This kind of objects is specified by the first parameter to NewDictionary (28.2-1), which is a sample object.

In some cases however such a sample object is not specific enough. For example when storing vectors over a finite field, it would not be clear whether all vectors will be over a prime field or over a field extension. Such an issue can be resolved by indicating in an (optional) third parameter to NewDictionary (28.2-1) a domain which has to be a collection that will contain all objects to be used with this dictionary. (Such a domain may also be used internally to decide that direct indexing can be used).

The reason for this choice of giving two parameters is that in some cases no suitable collection of objects has been defined in GAP - for example for permutations there is no object representing the symmetric group on infinitely many points.

Once a dictionary has been created, it is possible to use RepresentationsOfObject (13.4-3) to check which representation is used by GAP.

In the following example, we create a dictionary to store permutations with associated information.

gap> d:=NewDictionary((1,2,3),true);;
gap> AddDictionary(d,(1,2),1);
gap> AddDictionary(d,(5,6),9);
gap> AddDictionary(d,(4,7),2);
gap> LookupDictionary(d,(5,6));
9
gap> LookupDictionary(d,(5,7));
fail

A typical example of this use would be in an orbit algorithm. The dictionary would be used to store the elements known in the orbit together with their respective orbit positions.

We observe that this dictionary is stored internally by a sorted list. On the other hand, if we have an explicit, sorted element list, direct indexing is to be used.

gap> RepresentationsOfObject(d);
[ "IsComponentObjectRep", "IsNonAtomicComponentObjectRep",
  "IsDictionaryDefaultRep", "IsListDictionary",
  "IsListLookupDictionary", "IsSortDictionary",
  "IsSortLookupDictionary" ]
gap> d:=NewDictionary((1,2,3),true,Elements(SymmetricGroup(5)));;
gap> RepresentationsOfObject(d);
[ "IsComponentObjectRep", "IsNonAtomicComponentObjectRep",
  "IsDictionaryDefaultRep", "IsPositionDictionary",
  "IsPositionLookupDictionary" ]

(Just indicating SymmetricGroup(5) as a third parameter would still keep the first storage method, as indexing would be too expensive if no explicit element list is known.)

The same effect happens in the following example, in which we work with vectors: Indicating only a vector only enables sorted index, as it cannot be known whether all vectors will be defined over the prime field. On the other hand, providing the vector space (and thus limiting the domain) enables the use of hashing (which will be faster).

gap> v:=GF(2)^7;;
gap> d:=NewDictionary(Zero(v),true);;
gap> RepresentationsOfObject(d);
[ "IsComponentObjectRep", "IsNonAtomicComponentObjectRep",
  "IsDictionaryDefaultRep", "IsListDictionary",
  "IsListLookupDictionary", "IsSortDictionary",
  "IsSortLookupDictionary" ]
gap> d:=NewDictionary(Zero(v),true,v);;
gap> RepresentationsOfObject(d);
[ "IsComponentObjectRep", "IsNonAtomicComponentObjectRep",
  "IsDictionaryDefaultRep", "IsPositionDictionary",
  "IsPositionLookupDictionary" ]

28.2 Dictionaries

This section contains the formal declarations for dictionaries. For information on how to use them, please refer to the previous section 28.1. There are several ways how dictionaries are implemented: As lists, as sorted lists, as hash tables or via binary lists. A user however will just have to call NewDictionary (28.2-1) and obtain a suitable dictionary for the kind of objects she wants to create. It is possible however to create hash tables (see 28.4) and dictionaries using binary lists (see DictionaryByPosition (28.3-1)).

The use of two objects, obj and objcoll to parametrize the objects a dictionary is able to store might look confusing. However there are situations where either of them might be needed:

The first situation is that of objects, for which no formal collection object has been defined. A typical example here might be subspaces of a vector space. GAP does not formally define a Grassmannian or anything else to represent the multitude of all subspaces. So it is only possible to give the dictionary a sample object.

The other situation is that of an object which might represent quite varied domains. The permutation \((1,10^6)\) might be the nontrivial element of a cyclic group of order 2, it might be a representative of \(S_{{10^6}}\). In the first situation the best approach might be just to have two entries for the two possible objects, in the second situation a much more elaborate approach might be needed.

An algorithm that creates a dictionary will usually know a priori, from what domain all the objects will be, giving this domain permits to use a more efficient dictionary.

This is particularly true for vectors. From a single vector one cannot decide whether a calculation will take place over the smallest field containing all its entries or over a larger field.

28.2-1 NewDictionary

‣ NewDictionary( obj, look[, objcoll] ) ( function )

creates a new dictionary for objects such as obj. If objcoll is given the dictionary will be for objects only from this collection, knowing this can improve the performance. If objcoll is given, obj may be replaced by false, i.e. no sample object is needed.

The function tries to find the right kind of dictionary for the basic dictionary functions to be quick. If look is true, the dictionary will be a lookup dictionary, otherwise it is an ordinary dictionary.

28.3 Dictionaries via Binary Lists

As there are situations where the approach via binary lists is explicitly desired, such dictionaries can be created deliberately.

28.3-1 DictionaryByPosition

‣ DictionaryByPosition( list, lookup ) ( function )

creates a new (lookup) dictionary which uses PositionCanonical (21.16-3) in list for indexing. The dictionary will have an entry dict!.blist which is a bit list corresponding to list indicating the known values. If look is true, the dictionary will be a lookup dictionary, otherwise it is an ordinary dictionary.

28.3-2 IsDictionary

‣ IsDictionary( obj ) ( category )

A dictionary is a growable collection of objects that permits to add objects (with associated values) and to check whether an object is already known.

28.3-3 IsLookupDictionary

‣ IsLookupDictionary( obj ) ( category )

A lookup dictionary is a dictionary, which permits not only to check whether an object is contained, but also to retrieve associated values, using the operation LookupDictionary (28.3-6).

28.3-4 AddDictionary

‣ AddDictionary( dict, key[, val] ) ( operation )

adds key to the dictionary dict, storing the associated value val in case dict is a lookup dictionary. If key is not an object of the kind for which the dictionary was specified, or if key is known already to dict, the results are unpredictable.

28.3-5 KnowsDictionary

‣ KnowsDictionary( dict, key ) ( operation )

checks, whether key is known to the dictionary dict, and returns true or false accordingly. key must be an object of the kind for which the dictionary was specified, otherwise the results are unpredictable.

28.3-6 LookupDictionary

‣ LookupDictionary( dict, key ) ( operation )

looks up key in the lookup dictionary dict and returns the associated value. If key is not known to the dictionary, fail is returned.

28.4 General Hash Tables

These sections describe some particularities for hash tables. These are intended mainly for extending the implementation - programs requiring hash functionality ought to use the dictionary interface described above.

We hash by keys and also store a value. Keys cannot be removed from the table, but the corresponding value can be changed. Fast access to last hash index allows you to efficiently store more than one array of values –this facility should be used with care.

This code works for any kind of object, provided you have a DenseIntKey (28.5-1) method to convert the key into a positive integer. This method should ideally be implemented efficiently in the core.

Note that, for efficiency, it is currently impossible to create a hash table with non-positive integers.

28.5 Hash keys

The crucial step of hashing is to transform key objects into integers such that equal objects produce the same integer.

The actual function used will vary very much on the type of objects. However GAP provides already key functions for some commonly encountered objects.

28.5-1 DenseIntKey

‣ DenseIntKey( objcoll, obj ) ( operation )

returns a function that can be used as hash key function for objects such as obj in the collection objcoll. Typically, objcoll will be a large domain. If the domain is not available, it can be given as false in which case the hash key function will be determined only based on obj. (For a further discussion of these two arguments see NewDictionary (28.2-1)).

The function returned by DenseIntKey is guaranteed to give different values for different objects. If no suitable hash key function has been predefined, fail is returned.

28.5-2 SparseIntKey

‣ SparseIntKey( objcoll, obj ) ( operation )

returns a function that can be used as hash key function for objects such as obj in the collection objcoll. In contrast to DenseIntKey (28.5-1), the function returned may return the same key value for different objects. If no suitable hash key function has been predefined, fail is returned.

28.6 Dense hash tables

Dense hash tables are used for hashing dense sets without collisions, in particular integers. Keys are stored as an unordered list and values as an array with holes. The position of a value is given by the function returned by DenseIntKey (28.5-1), and so KeyIntDense must be one-to-one.

28.6-1 DenseHashTable

‣ DenseHashTable( ) ( function )

Construct an empty dense hash table. This is the only correct way to construct such a table.

28.7 Sparse hash tables

Sparse hash tables are used for hashing sparse sets. Stores keys as an array with fail denoting an empty position, stores values as an array with holes. Uses the result of calling SparseIntKey (28.5-2)) of the key. DefaultHashLength is the default starting hash table length; the table is doubled when it becomes half full.

In sparse hash tables, the integer obtained from the hash key is then transformed to an index position by taking it modulo the length of the hash array.

28.7-1 SparseHashTable

‣ SparseHashTable( [intkeyfun[, startlength]] ) ( function )

Construct an empty sparse hash table. This is the only correct way to construct such a table. If the argument intkeyfun is given, this function will be used to obtain numbers for the keys passed to it. If also startlength is given, the hash table will be initialized at that size.

28.7-2 DoubleHashArraySize

‣ DoubleHashArraySize( hash ) ( function )

Double the size of the hash array and rehash all the entries. This will also happen automatically when the hash array is half full.

28 Dictionaries and General Hash Tables