> < ^ Date: Tue, 02 Feb 1993 17:22:36 +0100
> < ^ From: Martin Schoenert <martin.schoenert@math.rwth-aachen.de >
> < ^ Subject: Re: Re: GAPstones

In his e-mail message of 1993/02/01 Ralf Dentzer writes:
And as I am just at it, do the authors of "combinat.tst" know
what is mainly measured by this benchmark? From a gprof it seems
that most of the time is spent in the GAP Interpreter (>50 %)
and only very little for doing arithmetic and memory management (<5 % each).
Does this reflect time usage of typical GAP routines? What about
permutations, AG groups, finite fields, cyclotomic fields?

It is quite difficult to say what a typical use of GAP is. For example,
if one works with permutation groups of degree over 100, one sees an
entire different picture. On the other hand, computations with character
tables that involve only very few irrationalities might actually show a
similar distribution (I never tried this though).

And the answer is, yes, we are aware of the specific distribution of
"combinat.tst". But we have found that the performance reported by
"combinat.tst" is reasonably close to our ``feeling'' of how fast the
machines we have tried it on are.

Actually one can even be a bit more specific. Here is the output of
'pixie' (the most accurate profiling tool, because it actually counts
cycles, unfortunately only available on machines with the MIPS processor)
for GAP 3.1 running "combinat.tst". The first column contains the number
of cycles, the second the percentage of the total number of cycles, the
third the number of calls, and the fourth the name of the function. I
have grouped the functions according to the package they belong to, and
have ommitted functions whose percentage was below 0.1%.

Interpreter (eval 22.08%, function 19.83%, statemen 14.59%)
33595913    7.65         2584301 EvVar (eval.c)
 9776625    2.23          196075 Eq (eval.c)
 9435970    2.15          190798 Diff (eval.c)
 9324288    2.12          181912 Sum (eval.c)
 8133406    1.85           71330 FunShallowCopy (eval.c)
 6900221    1.57          188588 EvVarAss (eval.c)
 6238983    1.42          143309 EvAnd (eval.c)
 5070936    1.15          101630 EvOr (eval.c)
 2867693    0.65           39239 Prod (eval.c)
 2094182    0.48           38040 Ne (eval.c)
 1875496    0.43           33166 Le (eval.c)
  435759    0.10            6307 Mod (eval.c)

55270007   12.59          250223 EvFunccall (function.c)
28437062    6.48          183380 ChangeEnv (function.c)
 3205685    0.73           91591 EvReturn (function.c)

38293896    8.72          266957 EvIf (statemen.c)
17140123    3.90           53408 EvFor (statemen.c)
 8393290    1.91           89811 EvStatseq (statemen.c)

Memory Management (gasman 21.51%)

33964602    7.74          308242 NewBag (gasman.c)
32295304    7.36              10 CollectGarb (gasman.c)
11116554    2.53          362625 ExitKernel (gasman.c)
 8523620    1.94           57558 Resize (gasman.c)
 4714125    1.07          362625 EnterKernel (gasman.c)
 2938674    0.67          186597 NrHandles (gasman.c)
  873862    0.20               1 InitGasman (gasman.c)
Lists (list 16.99%, vector 0.02%)
31478498    7.17          379801 EvListElm (list.c)
19866656    4.52          162053 EvListAss (list.c)
13102065    2.98           60932 FunAppend (list.c)
 5375002    1.22           68101 MakeList (list.c)
 1429617    0.33           68077 EvMakeList (list.c)
 1249837    0.28           12062 FunAdd (list.c)
  946479    0.22            4636 PosList (list.c)
Arithmetic (integer 1.21%, rational 0.17%)
1469833    0.33           24474 QuoInt (integer.c)
1396376    0.32           27243 ProdInt (integer.c)
1194533    0.27           16211 GcdInt (integer.c)
 524400    0.12            4930 SumInt (integer.c)
423835    0.10            4678 QuoRat (rational.c)
Reading (scanner 1.38%, read 0.54%, indents 0.2%, system 0.54%, libc 0.87%)
2086621    0.48           19636 GetSymbol (scanner.c)
1383089    0.31            7972 GetIdent (scanner.c)
 924253    0.21           24385 PutChr (scanner.c)
 745285    0.17            5104 Pr (scanner.c)
 881700    0.20            5460 FindIdent (idents.c)
1133070    0.26          226614 SyIsIntr (system.c)
 855227    0.19            4319 SyFgets (system.c)
2982948    0.68            4317 fgets (../fgets.c)

A total of 56.5% of the running time are spent in the interpreter.

There most of the time is spent evaluating function calls ('EvFunccall'
and 'ChangeEnv'). This is because most of the functions in the
combinatorics package work in a highly recursive manner, and thus the
number of function calls is unusually high.

(Looking at the number of calls to 'ChangeEnv' (which is called twice by
'EvFunccall' for every non-internal GAP function), one sees that 91690
calls are calls to GAP functions, and the remaining 158533 are calls to
internal functions: 71330 'ShallowCopy', 60932 'Append', 12062 'Add',
5439 'Length', 4004 'Position', 2071 'IsList', 1976 'IsFunc', and 719
others. Comparing the numbers for 'ChangeEnv' and 'EvReturn' we see that
there were 99 calls to functions which did not return anything, most
likely they are all calls to 'Sort'. Finally note that both 'EvIf' and
'EvFor' have 'EvStatseq' inlined, i.e., they do not call 'EvStatseq' to
evaluate their bodies. So the number of calls of 'EvStatseq' reflects
the number of calls to GAP functions that have more than a single
statement in their body. So comparing the numbers for 'ChangeEnv' and
'EvStatseq' we see that about 1880 calls where to functions with only a
single statement in their body, e.g., functions of the form '<var> ->
<expr>'.)

'EvVar' contributes a large percentage of the total running time, simply
because it is call so often. Improving the speed of 'EvVar' would
therefor speed up GAP quit a bit. Unfortunately 'EvVar' is already as
minimal as it can be.

All in all the amount of time spent in the interpreter is much higher
than one would expect from other computations. Especially the large
number of function calls is not typical.

The memory management uses 21.5% of the total running time. 288655 of
the 308242 created bags ('NewBag') are easily accounted for: 91690 from
'EvFunccall', 71330 from 'ShallowCopy', 68077 from 'MakeList', and 57558
from 'Resize' calls. Note that the number of calls to 'Resize' is again
unusually high.

(We are thinking about a new memory manager (Ralf has actually provided a
prototype for such a memory manager). This new memory manager will make
'NewBag' a lot faster (probably a factor of 5 or so). 'NewBag' will also
become a macro, so that it will be more difficult to see its influence in
future profiles. 'CollectGarb' will become faster (because it will use
generations), 'ExitKernel' and 'EnterKernel' will no longer be
neccessary, and the number of calls to 'NewBag' will also be reduced
(because it is called most of the time from 'CollectGarb'). So I think
that the cost for memory management for this test could be reduced to
well below 10% with this new memory manager.)

Other computations will usually spent less time in the memory management,
unless most of the objects allocated are rather small, such as in this
computation. And of course, if the workspace is too small, so that too
many garbage collections are neccessary, the percentage of time spent in
'CollectGarb' can become arbitrarily high (I remember a Schreier Sims
computation which I once did on my Atari, where a garbage collection was
neccessary each time after allocating 20 permutations).

Now "combinat.tst" mainly works with lists (of integers), thus it is not
surprising that 17% of the total time is spent in the list package.
Especially the number of assignments to list elements is way above the
norm.

Again all in all one would expect other computations to spent less time
in the list package. But since lists are the main data structure in GAP
almost all computations spent quite a bit of their time in this package.

The amount of time spent in the arithmetic is small. This is because
arithmetic of immediate integers (i.e., -2^28 <= n < 2^28) is done in
'Eq', 'Sum', 'Diff', etc., without calling other functions. What you see
in 'SumInt', etc. are only the few computations that involve large
integers (e.g., Binomial( 400, 50 )).

Other computations that involve a lot of large integers, rationals,
cyclotomics, permutations, words, etc., will probably spent most of their
time in the respective arithmetic package.

And of course the time spent reading is very small. If you only do a
small computation with a group (where the whole library must first be
read in), this would contribute a much higher amount.

Martin.

-- .- .-. - .. -.  .-.. --- ...- . ...  .- -. -. .. -.- .-
Martin Sch"onert,   Martin.Schoenert@Math.RWTH-Aachen.DE,  +49 241 804551
Lehrstuhl D f"ur Mathematik, Templergraben 64, RWTH, D 51 Aachen, Germany

> < [top]