GAP (orb) - Chapter 4: Hashing techniques

Goto Chapter: Top 1 2 3 4 5 6 7 8 9 10 11 Bib Ind

[Top of Book] [Contents] [Previous Chapter] [Next Chapter]

4 Hashing techniques

4.1 The idea of hashing

If one wants to store a certain set of similar objects and wants to quickly access a given one (or come back with the result that it is unknown), the first idea would be to store them in a list, possibly sorted for faster access. This however still would need \(\log(n)\) comparisons to find a given element or to decide that it is not yet stored.

Therefore one uses a much bigger array and uses a function on the space of possible objects with integer values to decide, where in the array to store a certain object. If this so called hash function distributes the actually stored objects well enough over the array, the access time is constant in average. Of course, a hash function will usually not be injective, so one needs a strategy what to do in case of a so-called "collision", that is, if more than one object with the same hash value has to be stored. This package provides two ways to deal with collisions, one is implemented in the so called "HashTabs" and another in the "TreeHashTabs". The former simply uses other parts of the array to store the data involved in the collisions and the latter uses an AVL tree (see Chapter 8) to store all data objects with the same hash value. Both are used basically in the same way but sometimes behave a bit differently.

The basic functions to work with hash tables are HTCreate (4.3-1), HTAdd (4.3-2), HTValue (4.3-3), HTDelete (4.3-5) and HTUpdate (4.3-4). They are described in Section 4.3.

The legacy functions from older versions of this package to work with hash tables are NewHT (4.4-1), AddHT (4.4-2), and ValueHT (4.4-3). They are described in Section 4.4. In the next section, we first describe the infrastructure for hash functions.

4.2 Hash functions

In the orb package hash functions are chosen automatically by giving a sample object together with the length of the hash table. This is done with the following operation:

4.2-1 ChooseHashFunction

‣ ChooseHashFunction( ob, len ) ( operation )

Returns: a record

The first argument ob must be a sample object, that is, an object like those we want to store in the hash table later on. The argument len is an integer that gives the length of the hash table. Note that this might be called later on automatically, when a hash table is increased in size. The operation returns a record with two components. The component func is a GAP function taking two arguments, see below. The component data is some GAP object. Later on, the hash function will be called with two arguments, the first is the object for which it should call the hash value and the second argument must be the data stored in the data component.

The hash function has to return values between \(1\) and the hash length len inclusively.

This setup is chosen such that the hash functions can be global objects that are not created during the execution of ChooseHashFunction but still can change their behaviour depending on the data.

In the following we just document, for which types of objects there are hash functions that can be found using ChooseHashFunction.