GAP Forum: RE: AddSet etc.

> < ^ Date: Fri, 07 Jan 1994 20:44:00 +1000
> < ^ From: Dmitrii Pasechnik <d.pasechnik@twi.tudelft.nl >
< ^ Subject: RE: AddSet etc.

Dear Frank Cellar,

Some time ago we noticed that the function Orbit(G,S,OnSets) can
be hopelessly slow when |S| and in particular the length of the
resulting orbit are big enough.
yes, that is true, however
we found that the largest amount of CPU time is consumed by
AddSet. One may see looking at the appropriate C code that the
time taken by AddSet to add a new element to the set of length N
is proportional to N.
if you will take a closer look (actually you will find a comment
"perform the binary search"), you will see that 'AddSet' uses a binary
search, *but* it has to ensure that the list is indeed a set. To the

Sorry, the whole point is about the *addition* of elements rather than
*search*. Indeed AddSet is pretty fast if it does not need to add the
particular element. Again, the trouble in this case is with the handling of
the set as a whole rather than with the handling of its elements.
Here follows an extract from the file set.c, function
AddSet. (My own comments are marked by ***).

/* perform the binary search to find the position                      */
len   = LEN_PLIST( hdSet );
                            *** it is the length of the set.

i = 0;  k = len+1;
while ( i+1 < k ) {                 /* set[i] < elm && elm <= set[k]   */
    j = (i + k) / 2;                /* i < j < k                       */
    if ( LT( ELM_PLIST(hdSet,j), hdObj ) == HdTrue )  i = j;
    else                                              k = j;
}

/* add the element to the set if it is not already there               */
if ( len < k || EQ( ELM_PLIST(hdSet,k), hdObj ) != HdTrue ) {
    if ( SIZE(hdSet) < SIZE_PLEN_PLIST( len+1 ) )
        Resize( hdSet, SIZE_PLEN_PLIST( len + len/8 + 4 ) );
    SET_LEN_PLIST( hdSet, len+1 );
    for ( i = len+1; k < i; i-- ) *** the troube is here
                            *** this loop can be executed O(len) times

SET_ELM_PLIST( hdSet, i, ELM_PLIST(hdSet,i-1) ); *** moving len-th elt to the position len+1, *** (len-1)th elt to the position len, etc *** to make room for the new elt.

    SET_ELM_PLIST( hdSet, k, hdObj ); *** add new elt.
}

> inexperienced user a simple solution to the problem is
>
> It is not quite clear why faster ways to handle sets, such that the
> addition of a new element takes only O(log(N)) operations, are not used
> in the GAP implementation.
>
> but this naive approach has a drawback, namely that sets in this
> proposal could only contain unchangeable data structures like integers
> or permutations but not changeable like lists or records without
> introducing some amount of overhead for *all* list/record assignments.
> To illustrate the point, consider the following examples:
[...]

The example is rather strange, since it's more or less clear that any set
can be treated as a list with an ordering attached. The example in question
seems to violate this, i.e. make some elements of the set incomparable.
I doubt that in any serious computation one would make things like that:

a:=5;
l:=Set([a,2,3]);
a:=(1,2);

The assignment to <a> has changed the set-feature of <l>! There are
several solutions to the problem, each has its advantages *and*
disadvantages:
(a) Create unchangeable counterparts to lists and records, once created
they cannot be changed anymore or
(b) Create sets which only allow to test membership.
(a) and (b) both have the drawback (which showed heavily in GAP 2.4),
that you have to convert between different representations, creating
an immense amount of garbage.
(c) While making an assignment check if this particular list is an element
of a set. This would introduce overhead for *all* list operations.

But the example I deleted shows that this this the option already
chosen in GAP, isn't it?

(d) the Magma solution, which seems to be a mix of (a) and (b).
(e) Use knowledge about the algorithm *inside* your algorithm, not inside
the kernel of GAP.

It's not quite clear why the following option is not OK. Certainly (c) is
bad. But what about making the user responsible for the upgrading of sets,
if necessary. Say, after
a:=1;
l:=Set([a,2,3]);
a:=10;

call a function (say) Upgrade(l). One could suggest even better ways, by
taking some extra care *before* changing a.

In my opinion one should use (e). For instance, an orbit algorithm
can (hopefully) *guarantee* that it will not destroy the property that
the list of points computed so far is already sorted. So it should
look like:
od;
od;
    # return the orbit <orb>
    return orb;
end;
where 'AddSetNC' would be a function, that does no checks at all and
assumes that the list is already sorted. One could easily write such
a function and careful analysis will show that there is still room for

To be efficient, this function must involve the machinery for efficient
addition of elements in the set, e.g. AVL-trees etc.
It's not so *easy*.

Best regards,
Dima Pasechnik,
Dept. of Mathematics,
Univ. of Western Australia, Nedlands WA 6009, Australia
e-mail:dima@maths.uwa.edu.au

> < [top]