> < ^ From:

< ^ Subject:

Dear Frank Cellar,

Some time ago we noticed that the function Orbit(G,S,OnSets) can

be hopelessly slow when |S| and in particular the length of the

resulting orbit are big enough.yes, that is true, however

we found that the largest amount of CPU time is consumed by

AddSet. One may see looking at the appropriate C code that the

time taken by AddSet to add a new element to the set of length N

is proportional to N.if you will take a closer look (actually you will find a comment

"perform the binary search"), you will see that 'AddSet' uses a binary

search, *but* it has to ensure that the list is indeed a set. To the

Sorry, the whole point is about the *addition* of elements rather than

*search*. Indeed AddSet is pretty fast if it does not need to add the

particular element. Again, the trouble in this case is with the handling of

the set as a whole rather than with the handling of its elements.

Here follows an extract from the file set.c, function

AddSet. (My own comments are marked by ***).

/* perform the binary search to find the position */ len = LEN_PLIST( hdSet ); *** it is the length of the set. i = 0; k = len+1; while ( i+1 < k ) { /* set[i] < elm && elm <= set[k] */ j = (i + k) / 2; /* i < j < k */ if ( LT( ELM_PLIST(hdSet,j), hdObj ) == HdTrue ) i = j; else k = j; } /* add the element to the set if it is not already there */ if ( len < k || EQ( ELM_PLIST(hdSet,k), hdObj ) != HdTrue ) { if ( SIZE(hdSet) < SIZE_PLEN_PLIST( len+1 ) ) Resize( hdSet, SIZE_PLEN_PLIST( len + len/8 + 4 ) ); SET_LEN_PLIST( hdSet, len+1 ); for ( i = len+1; k < i; i-- ) *** the troube is here *** this loop can be executed O(len) timesSET_ELM_PLIST( hdSet, i, ELM_PLIST(hdSet,i-1) ); *** moving len-th elt to the position len+1, *** (len-1)th elt to the position len, etc *** to make room for the new elt.SET_ELM_PLIST( hdSet, k, hdObj ); *** add new elt. }

> inexperienced user a simple solution to the problem is

>

> It is not quite clear why faster ways to handle sets, such that the

> addition of a new element takes only O(log(N)) operations, are not used

> in the GAP implementation.

>

> but this naive approach has a drawback, namely that sets in this

> proposal could only contain unchangeable data structures like integers

> or permutations but not changeable like lists or records without

> introducing some amount of overhead for *all* list/record assignments.

> To illustrate the point, consider the following examples:

[...]

The example is rather strange, since it's more or less clear that any set

can be treated as a list with an ordering attached. The example in question

seems to violate this, i.e. make some elements of the set incomparable.

I doubt that in any serious computation one would make things like that:

a:=5; l:=Set([a,2,3]); a:=(1,2);

The assignment to <a> has changed the set-feature of <l>! There are

several solutions to the problem, each has its advantages *and*

disadvantages:(a) Create unchangeable counterparts to lists and records, once created

they cannot be changed anymore or(b) Create sets which only allow to test membership.

(a) and (b) both have the drawback (which showed heavily in GAP 2.4),

that you have to convert between different representations, creating

an immense amount of garbage.(c) While making an assignment check if this particular list is an element

of a set. This would introduce overhead for *all* list operations.

But the example I deleted shows that this this the option already

chosen in GAP, isn't it?

(d) the Magma solution, which seems to be a mix of (a) and (b).

(e) Use knowledge about the algorithm *inside* your algorithm, not inside

the kernel of GAP.

It's not quite clear why the following option is not OK. Certainly (c) is

bad. But what about making the user responsible for the upgrading of sets,

if necessary. Say, after

a:=1;

l:=Set([a,2,3]);

a:=10;

call a function (say) Upgrade(l). One could suggest even better ways, by

taking some extra care *before* changing a.

In my opinion one should use (e). For instance, an orbit algorithm

can (hopefully) *guarantee* that it will not destroy the property that

the list of points computed so far is already sorted. So it should

look like:

od;

od;# return the orbit <orb> return orb; end;where 'AddSetNC' would be a function, that does no checks at all and

assumes that the list is already sorted. One could easily write such

a function and careful analysis will show that there is still room for

To be efficient, this function must involve the machinery for efficient

addition of elements in the set, e.g. AVL-trees etc.

It's not so *easy*.

Best regards,

Dima Pasechnik,

Dept. of Mathematics,

Univ. of Western Australia, Nedlands WA 6009, Australia

e-mail:dima@maths.uwa.edu.au

> < [top]