GAP Forum: RE: AddSet etc.

> < ^ Date: Fri, 07 Jan 1994 09:29:00 +0100
> < ^ From: Frank Celler <frank.celler@math.rwth-aachen.de >
> < ^ Subject: RE: AddSet etc.

Dear Dima Pasechnik,

you wrote in your letter

Some time ago we noticed that the function Orbit(G,S,OnSets) can
be hopelessly slow when |S| and in particular the length of the
resulting orbit are big enough.

yes, that is true, however

we found that the largest amount of CPU time is consumed by
AddSet. One may see looking at the appropriate C code that the
time taken by AddSet to add a new element to the set of length N
is proportional to N.

if you will take a closer look (actually you will find a comment
"perform the binary search"), you will see that 'AddSet' uses a binary
search, *but* it has to ensure that the list is indeed a set. To the
inexperienced user a simple solution to the problem is

It is not quite clear why faster ways to handle sets, such that the
addition of a new element takes only O(log(N)) operations, are not used
in the GAP implementation.

but this naive approach has a drawback, namely that sets in this
proposal could only contain unchangeable data structures like integers
or permutations but not changeable like lists or records without
introducing some amount of overhead for *all* list/record assignments.
To illustrate the point, consider the following examples:

gap> l := [ 1, 2, 5, 6 ];;
gap> TYPE(l);
"list"
gap> AddSet( l, 3 );
gap> l;
[ 1, 2, 3, 5, 6 ]
gap> TYPE(l);
"set"
gap> AddSet( l, 0 );
gap> l;
[ 0, 1, 2, 3, 5, 6 ]

the first call to 'AddSet' checks that <l> is indeed a set and then
uses a binary search to add the '0'. The function 'IsSet' (which is
used in 'AddSet') attaches a "flag" to <l>, which shows that <l> is
indeed a set. The second call to 'AddSet' will not check again. So,
when dealing with lists containing integers, permutations, etc
'AddSet' will always (with the exception of the first call) use a
binary search. So what *is* the problem:

gap> l := [ [1,2], [2,3] ];
[ [ 1, 2 ], [ 2, 3 ] ]
gap> TYPE(l);
"list"
gap> a := [1,3];
gap> AddSet( l, a );
gap> l;
[ [ 1, 2 ], [ 1, 3 ], [ 2, 3 ] ]
gap> TYPE(l);
"list"
gap> IsSet(l);
true

Why hasn't GAP attached is magic flag to <l>? Well, if GAP would have
marked the list <l> as set, then each assignment into a list must
check if this particular list is *contained* in a set, in our example
<l>:

gap> a[1] := 2;
gap> IsSet(l);
false

The assignment to <a> has changed the set-feature of <l>! There are
several solutions to the problem, each has its advantages *and*
disadvantages:

(a) Create unchangeable counterparts to lists and records, once created
they cannot be changed anymore or

(b) Create sets which only allow to test membership.

(a) and (b) both have the drawback (which showed heavily in GAP 2.4),
that you have to convert between different representations, creating
an immense amount of garbage.

(c) While making an assignment check if this particular list is an element
of a set. This would introduce overhead for *all* list operations.

(d) the Magma solution, which seems to be a mix of (a) and (b).

(e) Use knowledge about the algorithm *inside* your algorithm, not inside
the kernel of GAP.

In my opinion one should use (e). For instance, an orbit algorithm
can (hopefully) *guarantee* that it will not destroy the property that
the list of points computed so far is already sorted. So it should
look like:

OrbitTest := function ( G, d, opr )
    local   orb,        # orbit, result
            set,        # orbit <orb> as set for faster membership test
            gen,        # generator of the group <G>
            pnt,        # point in the orbit <orb>
            img;        # image of the point <pnt> under the generator <gen>

orb := [ d ]; set := [ d ]; for pnt in orb do for gen in G.generators do img := opr( pnt, gen ); if not img in set then Add( orb, img ); AddSetNC( set, img ); # <-- instead of 'AddSet' fi; od; od;

    # return the orbit <orb>
    return orb;
end;

where 'AddSetNC' would be a function, that does no checks at all and
assumes that the list is already sorted. One could easily write such
a function and careful analysis will show that there is still room for
improvments, one should use 'PositionSorted' in order to get ride of
the 'in' avoiding doing a binary (or worse a linear) search twice
(once in 'in' and once in 'AddSetNC'), but our experiments never ran
into problems with the current orbit algorithm and because of the lack
of time and manpower we didn't check what the fastest possible
algorithm would be for all possible problems/experiments. You are
more than welcome to send a list of functions in which you think such
improvements are worthwhile.

best wishes
Frank

> < [top]