Section Trivial Parallelism covers trivial parallelism (the simplest and most common form of parallel application). This is followed in Section Using ParGAP interactively by a description of how to use ParGAP interactively, and Section Streaming illustrates these principles with a short implementation of parallel streaming. Streaming refers to simultaneously running different algorithms for the same problem in different ``streams'' or processes, and accepting the first answer that returns. The remaining processes for the other algorithms are then terminated. ParGAP allows those processes to reside on a single CPU or on different CPUs. Hence, in the case of streaming, ParGAP is potentially useful even if one has only a single computer available, since multiple streams can still reside locally in separate processes.
For more sophisticated (non-trivial) parallel applications, the TOP-C model of parallelism is recommended, and it is described in Section TOP-C model for non-trivial parallelism.
In parallel computation, perhaps 80% of the applications fall under the heading of ``trivial parallelism''. These are situations in which one must compute many unrelated cases. For example, perhaps 200 different cases must be computed and results returned for each case and each case has no interaction with any other one. In the absence of shared data, it would be common to start 20 GAP jobs on 20 distinct workstations in a student computer laboratory. If there are 200 cases, then one writes 20 different ``batch jobs'', each batch job handling 10 distinct cases.
ParGAP provides strong support for this common situation.
Conceptually, one is always talking to ParGAP on a master process (the
local workstation), and the master process talks to each of various slave
processes. In its simplest form, one merely generates a list of inputs
for the cases, and writes a suitable function to provide the case
information. Effectively, trivial parallelism can be expressed by a
parallel version of GAP's
(see List (for a list (and a function)) in the Reference Manual),
which is called
ParList() (see ParList) in
The following is an example of a ParGAP session that uses the
ParList() function. Observe the use of
(see BroadcastMsg) to ensure that the function
AnalyzePrimitivePermGroupsOfOrder() is known to the slaves.
gap> AnalyzePrimitivePermGroupsOfOrder := function(ord) > #returns a list of data records of all primitive > #permutation groups of order ord that are represented > #as permutation groups in GAP's primitive group database > > local PrimitiveGroupsOfOrder, AnalyzePrimitivePermGroup; > > PrimitiveGroupsOfOrder := > ord -> List( [1..NrPrimitiveGroups(ord)], > i -> PrimitiveGroup(ord,i) ); > > AnalyzePrimitivePermGroup := > grp -> rec( order := Size(grp), > degree := NrMovedPoints(grp), > baseSize := Size(BaseOfGroup(grp)) ); > > return List( Filtered( PrimitiveGroupsOfOrder(ord), > IsPermGroup ), > AnalyzePrimitivePermGroup ); > end;; gap> #Define AnalyzePrimitivePermGroupsOfOrder on slaves gap> BroadcastMsg( > PrintToString("AnalyzePrimitivePermGroupsOfOrder := ", > AnalyzePrimitivePermGroupsOfOrder ) ); gap> ParList( [2..256], AnalyzePrimitivePermGroupsOfOrder ); [ [ rec( order := 2, degree := 2, baseSize := 1 ) ], [ rec( order := 3, degree := 3, baseSize := 1 ), <...many lines of output omitted...>
As one expects, the answer is computed in parallel in a time that decreases linearly with the number of processors. Hence a Beowulf cluster of 16 processors (1 ParGAP master and 15 ParGAP slaves) will return this information approximately 15 times as fast as a single processor, greatly facilitating interactive exploration of the properties of various groups.
If each single case can be executed quickly, then the execution time for
ParList() can be dominated by the network overhead imposed by message
passing between the master and slaves. In this case, an optional third
ParList() specifies the number of cases to be submitted to
a slave in a single message (modulo how many cases are left, of course).
As an example, suppose we are interested in finding those of the first
100,000 integers that have many distinct prime factors, but to reduce the
network overhead we wish to batch 1000 cases at a time to the slaves.
Then one is able to execute the following.
gap> ParList([1..100000], > i -> Length(PrimePowersInt(i))/2, 1000);
Having seen an example where ParGAP can be useful, one next needs to know what is involved in setting up and using ParGAP. In the following, recall that ParGAP employs a master-slave architecture, with the local process being the ``master'' process, and all others being ``slaves''.
ParGAP is installed in the same way as other GAP packages. After
extracting the ParGAP files, one invokes
Section Installing ParGAP for the details.) Instead of invoking
one then invokes the supplied shell
pargap.sh, (possibly modified and/or renamed; again see
Section Running ParGAP for the details). Observe that unlike most other
GAP packages, one should not call
LoadPackage() function from
within a GAP session to start ParGAP. As usual, master and slave
processes read a
.gaprc file in the home directory, if present.
The method by which slave processes are started depends on the MPI implementation used (see Section Running ParGAP). When using a system MPI library, the system MPI launcher should be used to start ParGAP, and this also starts up the slaves. One commonly-supported MPI launcher syntax is
mpiexec -n num
is the total number of copies of
pargap.sh that you wish to
run. If, instead, ParGAP is built to use MPINU, the subset MPI implementation
included with ParGAP, then you should invoke
and ParGAP will start up the slave processes itself. For details of these,
it looks for a file named
procgroup file in the current
directory, but can be directed to use a different file through the
-p4pg switch. Hence, if one always uses ParGAP
with the slave processes specified in
myprocgroup.big, one may wish to
set up an alias or else a symbolic link to a script containing the
pargap.sh -p4pg PATH
Observe also that heterogeneous computing is supported, in that each slave machine may run a different operating system on different hardware.
Within ParGAP, the standard message-passing commands (send, receive,
probe, etc.) are available. The slaves are numbered consecutively
starting at 1. If no slave parameter is supplied for
SendRecvMsg(), then the default slave is 1. Most other commands take a
default slave parameter,
MPI_ANY_SOURCE (meaning ``
PingSlave() sends a ``null message'', and waits for an acknowledgement
that the slave is alive.
); sends the (string) message string
to slave i. Slave i then evaluates string as a GAP command, and
returns an answer. If a command has no answer, (such as
" is returned. If multiple commands are
sent, such as
"x:=3; y:=4; x+y;", then only the last return value is
returned. Note that the final semicolon, normally required by GAP, is
optional in a command string for a message.
Most of the other commands are clear from context. Every
SendMsg() to a
slave must be balanced by a
RecvMsg(). The exception to this rule
is that, when using MPINU, the commands
are available and may be used to flush any pending messages
from a slave. A
) command is equivalent to the
BroadcastMsg() sends its string argument to each slave, and
there is no reply message.
ParEval() is like
BroadcastMsg(), but it
also evaluates on the master process. If one is unsure whether there is a
pending message, one can invoke
if no slave parameter is provided, the default is
ParRead() is a parallel version of GAP's
it is useful for ensuring that the slaves and master share a similar
environment. Other examples of commands that are analogous to GAP's
built-in commands include:
Section Slave Listener Commands for more complete descriptions of the
An illustration of the usage of the commands mentioned above with a
sample ParGAP session, for which a
procgroup file has defined two
slaves is the first example given in Section Extended Example. It's
recommended that you review that example at this point.
Next, we demonstrate a simple implementation of ``streaming'' as an example of how easy it is to use the ParGAP tools to provide new functionality. To the best of this author's knowledge, the term ``streaming'' originated with Charles Leedham-Green at the conference on which this proceedings is based, where he made a general request for streaming functionality to be implemented in some software system.
The idea of streaming is that one may have two algorithms for solving a problem. One would like to to solve the problem simultaneously on two processors, each using a distinct algorithm. Whichever processor finds the solution first should report back to the master process. The master process should then interrupt the other slave, so as to make it available for further computation. (NB Resetting of slaves is only possible with the MPINU library.) This is useful when one has a heuristic available that may finish early with the correct answer, or may continue for a very long time. The heuristic can then be ``streamed'' alongside the standard algorithm, in full confidence that the standard algorithm will provide a reasonable upper bound on the time to compute an answer.
One way to provide ``streaming'' functionality is by the implementation
of a function we describe below that is inspired by GAP's
function (see First in the GAP Reference Manual). The function
First() takes a list and boolean function as arguments, and returns the
first element of the list for which the boolean function returns
In contrast, we define a corresponding ParGAP function,
ParFirstResult(), which takes a list and a (general) function as
arguments. Messages are sent to the slaves causing the ith slave to
evaluate the function on the ith list element. The value returned is
the value obtained from whichever slave finishes first. Note that in
consistency with the goal of streaming, the function signals an error if
asked to evaluate a list with more entries than the number of slaves.
gap> ParFirstResult := function( list, fnc ) > local i, result; > if Length(list) > TOPCnumSlaves then > Error("too few slaves"); > fi; > for i in [1..Length(list)] do > SendMsg( PrintToString( > "fnc :=", fnc, "; fnc(", list[i], ");"), > i ); > od; > result := RecvMsg(); # default is MPI_ANY_SOURCE > ParReset(); # Interrupt all other slaves - only if using MPINU > return result; > end;;
We now demonstrate the use of
ParFirstResult() in ``streaming''. For
our example we need the FactInt package by Stefan Kohl. To get
this package if you don't have it, visit
or the equivalent at one of GAP's mirror sites, and follow the easy installation instructions.
If one hasn't already included a
.gaprc file then it is necessary to do:
gap> ParEval("LoadPackage(\"factint\")"); Loading FactInt 1.5.2 (Routines for Integer Factorization ) by Stefan Kohl, email@example.com true
so that each slave (not just the master) is aware of the FactInt functions.
Above we defined
ParFirstResult() on the master process. We will assume
that we have two slaves.
gap> StreamingFactInt := function(i, x) > local alg; > alg := > [ x -> [ "MPQS ALGORITHM", FactorsMPQS(Factorial(x)+1) ], > x -> [ "CFRAC ALGORITHM", FactorsCFRAC(Factorial(x)+1) ] > ]; > return alg[i](x); > end;;
StreamingFactInt(1, x); and
StreamingFactInt(2, x); factor an
integer, but one uses the multiple polynomial quadratic sieve algorithm
(MPQS), and the other uses the continued fraction algorithm (CFRAC); the
FactorsCFRAC that perform these algorithms
are defined by the FactInt package. We demonstrate the
``streaming'' of these algorithms in determining the factorization of
gap> # Now, define StreamingFactInt() on each of the slaves. gap> BroadcastMsg( PrintToString( "StreamingFactInt :=", > StreamingFactInt ) ); gap> ParFirstResult([1,2], i->StreamingFactInt(i, 35) ); ... resetting ... [ "MPQS ALGORITHM", [ 137, 379, 17839, 340825649, 32731815563800396289317 ] ]
ParReset() function to reset the other slaves is only available if
ParGAP is built to use MPINU. If you are using a system MPI library then
the other slaves will continue with their algorithm, and you will need to
wait for them to complete and return a result before reusing them.
There are many examples where trivial parallelism does not suffice. Typically, this happens either when there are global variables that must be ``read'' by each slave process, and possibly updated, or else when the input for the next slave process depends on the result that arrived from the last slave process. Here we provide only the basics of the parallel model. The parallel model was described in more detail in Coo97, and a still more detailed description is contained in Chapter Basic Concepts for the TOP-C model (MasterSlave).
For non-trivial parallelism, ParGAP uses the TOP-C model Coo96. ParGAP typically invokes the TOP-C model via a command such as
The four arguments, GenerateTaskInput, DoTask, CheckTaskResult, and
UpdateSharedData are ``callback'' functions written by the application
programmer, and the names of those callback functions are arbitrary. (The
manual for ParGAP/MPI, the earlier incarnation of ParGAP
used slightly different names for its examples, but the purpose of the
MasterSlave() has not changed -- only the naming
The only ParGAP command above is
MasterSlave(). A task is an
arbitrary function (here called
()) executed on a slave
process, that takes input from the master process and returns its output
to the master process. The diagram in Section Other TOP-C Commands
gives some idea of the flow of control as a task is processed. The
diagram there is meant to represent the three main abstractions of the
TOP-C model: (1) the task, (2) the shared data, and (3) the action. The
shared data consists of any globally shared data (which should be
readable by all processes). The task is a procedure executed on a slave
process that takes a task input, and the shared data, and produces a
task output, which depends only on the task input and shared data.
Finally, the task input and task output are sent to the master process,
which must then decide upon an action.
Typical actions are
NO_ACTION (and the master process might save the
task output in a private list of results),
UPDATE_SHARED_DATA (send a
message to all slaves to update the local copy of the globally shared
REDO (re-do the computation in case the shared data was
changed by another slave process). (Earlier incarnations of ParGAP
UPDATE. Now the alternative
UPDATE_SHARED_DATA is offered and we
standardize on this here.)
An example invocation of
MasterSlave() is shown below, where we pass
the four application functions as direct arguments of
The routine below implements a simplified version of the
function described in Section Trivial Parallelism.
ParInstallTOPCGlobalFunction( "MyParList", function( list, fnc ) local result, iter; result := ; # Each invocation of GAP's iterator # returns next element of list. iter := Iterator(list); MasterSlave( # GenerateTaskInput(): function() if IsDoneIterator(iter) then return NOTASK; else return NextIterator(iter); fi; end, # DoTask(): fnc, # CheckTaskResult(): function(input,output) result[input] := output; return NO_ACTION; end, # UpdateSharedData(): Error # We never see action: UPDATE_SHARED_DATA ); return result; end );
MyParList on all
ParGAP processes. It also defines the version of
MyParList() on the
master differently from on a slave, so that a call,
x->x^2);, on the master automatically causes
x->x^2); to be invoked on the slave with the same arguments. This is
required for the TOP-C model. Note that the distinct function,
ParInstallGlobalFunction(), exists for the equivalent of
"InstallGlobalFunction( MyParList, function ... end)");. Both functions
should be called only from the master (possibly from inside a
The shared data consists only of the variable
fnc, which is read by all
processes, but in this case is never ``updated''. Note that one need not
explicitly declare variables that are in the shared data. A TOP-C shared
data variable is defined as a variable whose value is read by more than
one process, and which is modified only through a call to the application
(). If we would like to see the messages, as
they are passed back and forth between master and slave, we can
optionally set the variable
ParTrace (see ParTrace). We are then
ready to execute our new parallel function.
gap> ParTrace := true;; gap> MyParList([2..256], AnalyzePrimitivePermGroupsOfOrder);
The example above in fact requires only trivial parallelism. Hence the
() parameter to
MasterSlave() is especially simple.
Since the action is never
REDO, we achieve the maximum concurrency
among slave processes, resulting in a speedup that is linear in the
number of slaves.
In general, the parallel efficiency of an application, with its
concommitant decisions about concurrency of slave tasks are typically
determined by the actions chosen by
(). The sample
below is a standard ParGAP ``idiom'' that allows one to easily set up
reasonable concurrency in a parallel application.
CheckTaskResult := function( taskInput, taskOutput ) if taskOutput = fail then return NO_ACTION; elif not IsUpToDate() then return REDO_ACTION; else return UPDATE_ACTION; fi; end;
IsUpToDate() is a boolean function provided by ParGAP
true if there has been no
since the time that the ``corresponding'' task input was generated by a
call to the user-provided function
false. The ``corresponding'' task input is the
task input associated with that task output which was most recently seen
on the master process. In the context of the user-provided function
(), it is the first argument to that function.
[Up] [Previous] [Next] [Index]