> < ^ From:

> < ^ Subject:

Dear Dima Pasechnik,

you wrote in your letter

Some time ago we noticed that the function Orbit(G,S,OnSets) can

be hopelessly slow when |S| and in particular the length of the

resulting orbit are big enough.

yes, that is true, however

we found that the largest amount of CPU time is consumed by

AddSet. One may see looking at the appropriate C code that the

time taken by AddSet to add a new element to the set of length N

is proportional to N.

if you will take a closer look (actually you will find a comment

"perform the binary search"), you will see that 'AddSet' uses a binary

search, *but* it has to ensure that the list is indeed a set. To the

inexperienced user a simple solution to the problem is

It is not quite clear why faster ways to handle sets, such that the

addition of a new element takes only O(log(N)) operations, are not used

in the GAP implementation.

but this naive approach has a drawback, namely that sets in this

proposal could only contain unchangeable data structures like integers

or permutations but not changeable like lists or records without

introducing some amount of overhead for *all* list/record assignments.

To illustrate the point, consider the following examples:

gap> l := [ 1, 2, 5, 6 ];; gap> TYPE(l); "list" gap> AddSet( l, 3 ); gap> l; [ 1, 2, 3, 5, 6 ] gap> TYPE(l); "set" gap> AddSet( l, 0 ); gap> l; [ 0, 1, 2, 3, 5, 6 ]

the first call to 'AddSet' checks that <l> is indeed a set and then

uses a binary search to add the '0'. The function 'IsSet' (which is

used in 'AddSet') attaches a "flag" to <l>, which shows that <l> is

indeed a set. The second call to 'AddSet' will not check again. So,

when dealing with lists containing integers, permutations, etc

'AddSet' will always (with the exception of the first call) use a

binary search. So what *is* the problem:

gap> l := [ [1,2], [2,3] ]; [ [ 1, 2 ], [ 2, 3 ] ] gap> TYPE(l); "list" gap> a := [1,3]; gap> AddSet( l, a ); gap> l; [ [ 1, 2 ], [ 1, 3 ], [ 2, 3 ] ] gap> TYPE(l); "list" gap> IsSet(l); true

Why hasn't GAP attached is magic flag to <l>? Well, if GAP would have

marked the list <l> as set, then each assignment into a list must

check if this particular list is *contained* in a set, in our example

<l>:

gap> a[1] := 2; gap> IsSet(l); false

The assignment to <a> has changed the set-feature of <l>! There are

several solutions to the problem, each has its advantages *and*

disadvantages:

(a) Create unchangeable counterparts to lists and records, once created

they cannot be changed anymore or

(b) Create sets which only allow to test membership.

(a) and (b) both have the drawback (which showed heavily in GAP 2.4),

that you have to convert between different representations, creating

an immense amount of garbage.

(c) While making an assignment check if this particular list is an element

of a set. This would introduce overhead for *all* list operations.

(d) the Magma solution, which seems to be a mix of (a) and (b).

(e) Use knowledge about the algorithm *inside* your algorithm, not inside

the kernel of GAP.

In my opinion one should use (e). For instance, an orbit algorithm

can (hopefully) *guarantee* that it will not destroy the property that

the list of points computed so far is already sorted. So it should

look like:

OrbitTest := function ( G, d, opr ) local orb, # orbit, result set, # orbit <orb> as set for faster membership test gen, # generator of the group <G> pnt, # point in the orbit <orb> img; # image of the point <pnt> under the generator <gen>orb := [ d ]; set := [ d ]; for pnt in orb do for gen in G.generators do img := opr( pnt, gen ); if not img in set then Add( orb, img ); AddSetNC( set, img ); # <-- instead of 'AddSet' fi; od; od;# return the orbit <orb> return orb; end;

where 'AddSetNC' would be a function, that does no checks at all and

assumes that the list is already sorted. One could easily write such

a function and careful analysis will show that there is still room for

improvments, one should use 'PositionSorted' in order to get ride of

the 'in' avoiding doing a binary (or worse a linear) search twice

(once in 'in' and once in 'AddSetNC'), but our experiments never ran

into problems with the current orbit algorithm and because of the lack

of time and manpower we didn't check what the fastest possible

algorithm would be for all possible problems/experiments. You are

more than welcome to send a list of functions in which you think such

improvements are worthwhile.

best wishes

Frank

> < [top]