[GAP Forum] Lookup tables in gap

Fri Jun 27 23:47:41 BST 2008

I ran into a performance wall in GAP while handling lookup tables. By
telling the story I hope to either get advice on how to handle the
problem (although I found a way out and explain it later), or point
the maintainers to a possible improvement in GAP's guts.

The story begins with an intricate structure, whose nodes are records
uniquely labelled by strings, with several string pointing to other
records.  Some of these pointers are labels of nonexistent nodes.
These nodes are on a list N,  ordered by their labels as strings.
Sizes: just under 1,000,000 nodes, and about 2,500,000 labels. 
(BTW: the nodes are elements of a group I am investigating, so it is
quite natural to work on it in GAP)

I had to traverse the structure, following these pointers.  So I
needed a lookup table to convert labels into records.  

First, obvious solution: produced an ordered list L of all labels
(that was fast!).  Although L is longer than N, it is true that the
label of N[i] is L[i], so, given a label s, the corresponding record
is N[Position(L,s)].  Also, since L is ordered, lookup should be fast.

That seems fine, so I tried a traversal, which I know takes linear
time on the number of links.  It took forever.  I had the routine
print a progress report, and it was clearly crawling.

Then I tried an alternative: keep L, and replace in N each label by
its corresponding index in L.  Tried it in place and tried it
generating a new list.  In both cases it crawled.

At this point, it is worth mentioning that the hardware I use is up to
the task: fast 64bit processors, enough memory (I allowed GAP 18GB,
but it never reached 8).

Last chance: use a record as an associative array.  That is, create a
record R such that, for every label s, R.(s) = Position(L,s). Looking
at the GAP source was encouraging, as records are implemented using
hashing.  Filling up R is easy: for every index i on L, R.(L[i]):=i.
Fortunately, I had a progress meter.  It started very fast, up to
300,000.  Then it petered out to more that a second per index.

So, that is the story.  Could I have done any better or could the
record implementation get some revamping?

I solved my problem by going out of GAP.  First I printed the
structure to a file (2.5GB), and then processed it through a small
perl program to do the transformation I tried before.  The logic is
the same as I tried with R, usig a perl hash.  In less than 60 seconds
it produced a GAP-readable file containing L and the transformed N.

That really solved my problem.  After adapting the traversal routines,
now GAP processes N any which way I need very quickly.

Still, I spent quite sometime trying a pure GAP solution; maybe GAP
could borrow the hash implementation from perl.

Cheers,

Arnaldo Mandel                        
Departamento de Ciência da Computação - Computer Science Department
Universidade de São Paulo, Bra[sz]il	  
am at ime.usp.br
Talvez você seja um Bright http://the-brights.net Maybe you are a Bright.