8 AVL trees

8.2 Using AVL trees

8.2-1 AVLTree

8.2-2 AVLCmp

8.2-3 AVLAdd

8.2-4 AVLLookup

8.2-5 AVLDelete

8.2-6 AVLFindIndex

8.2-7 AVLIndex

8.2-8 AVLIndexLookup

8.2-9 AVLIndexAdd

8.2-10 AVLIndexDelete

8.2-11 AVLFind

8.2-12 AVLIndexFind

8.2-13 AVLData

8.2-14 AVLValue

8.2-15 Display

8.2-16 ELM_LIST

8.2-17 Position

8.2-18 Add

8.2-19 Remove

8.2-20 Length

8.2-1 AVLTree

8.2-2 AVLCmp

8.2-3 AVLAdd

8.2-4 AVLLookup

8.2-5 AVLDelete

8.2-6 AVLFindIndex

8.2-7 AVLIndex

8.2-8 AVLIndexLookup

8.2-9 AVLIndexAdd

8.2-10 AVLIndexDelete

8.2-11 AVLFind

8.2-12 AVLIndexFind

8.2-13 AVLData

8.2-14 AVLValue

8.2-15 Display

8.2-16 ELM_LIST

8.2-17 Position

8.2-18 Add

8.2-19 Remove

8.2-20 Length

`8.2-21 \in`

AVL trees are balanced binary trees called "AVL trees" in honour of their inventors G.M. Adelson-Velskii and E.M. Landis (see [AVM62]). A description in English can be found in [Knu97] in Section 6.2.3 about balanced trees.

The general idea is to store data in a binary tree such that all entries in the left subtree of a node are smaller than the entry at the node and all entries in the right subtree are bigger. The tree is kept "balanced" which means that for each node the depth of the left and right subtrees differs by at most 1. In this way, finding something in the tree, adding a new entry, deleting an entry all have complexity \(log(n)\) where \(n\) is the number of entries in the tree. If one additionally stores the number of entries in the left subtree of each node, then finding the \(k\)-th entry, removing the \(k\)-th entry and inserting an entry in position \(k\) also have complexity \(log(n)\). The **orb** contains an implementation of such tree objects providing all these operations.

"Entries" in AVL tree objects are key-value pairs and the sorting is done by the key. If all values as `true`

then no memory is needed to store the values (see the corresponding behaviour for hash tables). The only requirement on the type of the keys is that two arbitrary keys must be comparable in the sense that one can decide which of them is smaller. If **GAP**s standard comparison operations \(<\) and \(=\) work for your keys, no further action is required, if not, then you must provide your own three-way comparison function (see below).

Note that the AVL trees implemented here can be used in basically two different ways, which can sometimes be mixed: The usual way is by accessing entries by their key, the tree is then automatically kept sorted. The alternative way is by accessing entries by their index in the tree! Since the nodes of the trees remember how many elements are stored in their left subtree, it is in fact possible to access the \(k\)-th entry in the tree or delete it. It is even possible to insert something in position \(k\). However, note that if you do this latter operation, you are yourself responsible to keep the entries in the tree sorted. You can ignore this responsibility, but then you can no longer access the entries in the tree by their key and the corresponding functions might fail or even run into errors.

This usage can be useful, since in this way AVL trees provide an implementation of a list data structure where the operation list access (by index), adding an element (in an arbitrary position) and deleting an element (by its index) all have complexity \(log(n)\) where \(n\) is the number of entries in the list.

An AVL tree is created using the following function:

`‣ AVLTree` ( [opt] ) | ( function ) |

Returns: A new AVL tree object

This function creates a new AVL tree object. The optional argument `opt` is an options record, in which you can bind the following components:

`cmpfunc`

is a three-way comparison function taking two arguments `a` and `b` and returning \(-1\) if \(\textit{a} < \textit{b}\), \(+1\) if \(\textit{a} > \textit{b}\) and \(0\) if \(\textit{a} = \textit{b}\). If no function is given then the generic function `AVLCmp`

(8.2-2) is taken. This three-way comparison function is stored with the tree and is used for all comparisons in tree operations. `allocsize`

is the number of nodes which are allocated for the tree initially. It can be useful to specify this if you know that your tree will eventually contain a lot of entries, since then the tree object does not have to grow that many times.

For every AVL tree a three-way comparison function is needed, usually you can get away with using the following default one:

`‣ AVLCmp` ( a, b ) | ( function ) |

Returns: -1, 0 or 1

This function calls the `<`

operation and the `=`

operation to provide a generic three-way comparison function to be used in AVL tree operations. See `AVLTree`

(8.2-1) for a description of the return value. This function is implemented in the kernel and should be particularly fast.

The following functions are used to access entries by key:

`‣ AVLAdd` ( t, key, val ) | ( function ) |

Returns: `true`

or `fail`

The first argument `t` must be an AVL tree. This function stores the key `key` with value `value` in the tree assuming that the keys in it are sorted according to the three-way comparison function stored with the tree. If `value` is `true`

then no additional memory is needed. It is an error if there is already a key equal to `key` in the tree, in this case the function returns `fail`

. Otherwise it returns `true`

.

`‣ AVLLookup` ( t, key ) | ( function ) |

Returns: an value or `fail`

The first argument `t` must be an AVL tree. This function looks up the key `key` in the tree and returns the value which is associated to it. If the key is not in the tree, the value `fail`

is returned. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

`‣ AVLDelete` ( t, key ) | ( function ) |

Returns: an value or `fail`

The first argument `t` must be an AVL tree. This function looks up the key `key` in the tree, deletes it and returns the value which was associated with it. If `key` is not contained in the tree then `fail`

is returned. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

`‣ AVLFindIndex` ( t, key ) | ( function ) |

Returns: an integer or `fail`

The first argument `t` must be an AVL tree. This function looks up the key `key` in the tree and returns the index, under which it is stored in the tree. This index is one-based, that is, it takes values from 1 to the number of entries in the tree. If `key` is not contained in the tree then `fail`

is returned. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

The following functions are used to access entries in trees by their index:

`‣ AVLIndex` ( t, index ) | ( function ) |

Returns: a key or `fail`

The first argument `t` must be an AVL tree. This function returns the key at index `index` in the tree, so `index` must be an integer in the range 1 to the number of elements in the tree. If the value is out of these bounds, `fail`

is returned. Note that to use this function it is not necessary that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

`‣ AVLIndexLookup` ( t, index ) | ( function ) |

Returns: a value or `fail`

The first argument `t` must be an AVL tree. This function returns the value associated to the key at index `index` in the tree, so `index` must be an integer in the range 1 to the number of elements in the tree. If the value is out of these bounds, `fail`

is returned. Note that to use this function it is not necessary that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

`‣ AVLIndexAdd` ( t, key, value, index ) | ( function ) |

Returns: a key or `fail`

The first argument `t` must be an AVL tree. This function inserts the key `key` at index `index` in the tree and associates the value `value` with it. If `value` is `true`

then no additional memory is needed to store the value. The index `index` must be an integer in the range 1 to \(n+1\) where \(n\) is the number of entries in the tree. The new key is inserted before the key which currently is stored at index `index`, so calling with `index` equal to \(n+1\) puts the new key at the end. If `index` is not in the corrent range, this function returns `fail`

and the tree remains unchanged.

**Caution:** With this function it is possible to put a key into the tree at a position such that the keys in the tree are no longer sorted according to the three-way comparison function stored with the tree! If you do this, the functions `AVLAdd`

(8.2-3), `AVLLookup`

(8.2-4), `AVLDelete`

(8.2-5) and `AVLFindIndex`

(8.2-6) will no longer work since they assume that the keys are sorted!

`‣ AVLIndexDelete` ( t, index ) | ( function ) |

Returns: a key or `fail`

The first argument `t` must be an AVL tree. This function deletes the key at index `index` in the tree and returns the value which was associated with it.

The following functions allow low level access to the AVL tree object:

`‣ AVLFind` ( t, key ) | ( function ) |

Returns: an integer or `fail`

The first argument `t` must be an AVL tree. This function locates the key `key` in the tree and returns the position in the positional object, at which the node which contains the key is stored. This position will always be divisible by 4. Use the functions `AVLData`

(8.2-13) and `AVLValue`

(8.2-14) to access the key and value of the node respectively. The function returns `fail`

if the key is not found in the tree. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

`‣ AVLIndexFind` ( t, index ) | ( function ) |

Returns: an integer or `fail`

The first argument `t` must be an AVL tree. This function locates the index `index` in the tree and returns the position in the positional object, at which the node which hash this index is stored. This position will always be divisible by 4. Use the functions `AVLData`

(8.2-13) and `AVLValue`

(8.2-14) to access the key and value of the node respectively. The function returns `fail`

if the key is not found in the tree. This function does not assume that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

`‣ AVLData` ( t, pos ) | ( function ) |

Returns: an key

The first argument `t` must be an AVL tree and the second a position in the positional object corresponding to a node as returned by `AVLFind`

(8.2-11). The function returns the key associated with this node.

`‣ AVLValue` ( t, pos ) | ( function ) |

Returns: a value

The first argument `t` must be an AVL tree and the second a position in the positional object corresponding to a node as returned by `AVLFind`

(8.2-11). The function returns the value associated with this node.

The following convenience methods for standard list methods are implemented for AVL tree objects:

`‣ Display` ( t ) | ( method ) |

Returns: nothing

This function displays the tree in a user-friendly way. Do not try this with trees containing many nodes!

`‣ ELM_LIST` ( t, index ) | ( method ) |

Returns: A key or `fail`

This method allows for easy access to the key at index `index` in the tree using the square bracket notation

. It does exactly the same as `t`[`index`]`AVLIndex`

(8.2-7). This is to make AVL trees behave more like lists.

`‣ Position` ( t, key ) | ( method ) |

Returns: an integer or `fail`

This method allows to use the `Position`

operation to locate the index at which the key `key` is stored in the tree. It does exactly the same as `AVLFindIndex`

(8.2-6). This is to make AVL trees behave more like lists.

`‣ Add` ( t, key[, index] ) | ( method ) |

Returns: nothing

This method allows to use the `Add`

operation to add a key (with associated value `true`

) to the tree at index `index`. It does exactly the same as `AVLIndexAdd`

(8.2-9), so the same warning about sortedness as there applies! If `index` is omitted, the key is added at the end. This is to make AVL trees behave more like lists.

`‣ Remove` ( t, index ) | ( method ) |

Returns: a key

This method allows to use the `Remove`

operation to remove a key from the tree at index `index`. If `index` is omitted, the last key in the tree is remove. This method returns the deleted key or `fail`

if the tree was empty. This is to make AVL trees behave more like lists.

`‣ Length` ( t ) | ( method ) |

Returns: a key

This method returns the number of entries stored in the tree `t`. This is to make AVL trees behave more like lists.

`8.2-21 \in`

`‣ \in` ( key, t ) | ( method ) |

Returns: `true`

or `false`

This method tests whether or not the key `key` is stored in the AVL tree `t`. This is to make AVL trees behave more like lists.

An AVL tree is a positional object in which the first 7 positions are used for administrative data (see table below) and then from position 8 on follow the nodes of the tree. Each node uses 4 positions such that all nodes begin at positions divisible by 4. The system allocates the positional object larger than actually needed such that not every new node leads to the object being copied. Nodes which become free are collected in a free list. The following table contains the information what is stored in each of the first 7 entries:

1 | last actually used position, is always congruent 3 mod 4 |

2 | position of first node in free list |

3 | number of currently used nodes in the tree |

4 | position of largest allocated position is always congruent 3 mod 4 |

5 | three-way comparison function |

6 | position of the top node |

7 | a plain list holding the values stored under the keys |

The four positions used for a node contain the following information, recall that each node starts at a position divisible by 4:

0 mod 4 | reference to the key |

1 mod 4 | position of left node or 0 if empty, balance factor (see below) |

2 mod 4 | position of right node or 0 if empty |

3 mod 4 | index: number of nodes in left subtree plus one |

Since all positions of nodes are divisible by 4, we can use the least significant two bits of the left node reference for the so called balance factor. Balance factor 0 (both bits 0) indicates that the depth of the left subtree is equal to the depth of the right subtree. Balance factor 1 (bits 01) indicates that the depth of the right subtree is one greater than the depth of the left subtree. Balance factor 2 (or -1 in [Knu97], here bits 10) indicates that the depth of the left subtree is one greater than the depth of the right subtree.

For freed nodes the position of the next free node in the free list is held in the 0 mod 4 position and 0 means the end of the free list.

Position 7 in the positional object can contain the value `fail`

, in this case all stored values are `true`

. This is a measure to limit the memory usage in the case that the only relevant information in the tree is the key and no values are stored there. This is in particular interesting if the tree structure is just used as a list implementation.

Note that all functions dealing with AVL trees are both implemented on the **GAP** level and on the kernel level. Both implementations do exactly the same thing, the kernel version is only much faster and tuned for efficiency whereas the **GAP** version documents the functionality better and is used as a fallback if the C-part of the **orb** is not compiled.

generated by GAPDoc2HTML