[Up] [Previous] [Next] [Index]

4 Tutorial

Sections

  1. Trivial Parallelism
  2. Using ParGAP interactively
  3. Streaming
  4. TOP-C model for non-trivial parallelism

Section Trivial Parallelism covers trivial parallelism (the simplest and most common form of parallel application). This is followed in Section Using ParGAP interactively by a description of how to use ParGAP interactively, and Section Streaming illustrates these principles with a short implementation of parallel streaming. Streaming refers to simultaneously running different algorithms for the same problem in different ``streams'' or processes, and accepting the first answer that returns. The remaining processes for the other algorithms are then terminated. ParGAP allows those processes to reside on a single CPU or on different CPUs. Hence, in the case of streaming, ParGAP is potentially useful even if one has only a single computer available, since multiple streams can still reside locally in separate processes.

For more sophisticated (non-trivial) parallel applications, the TOP-C model of parallelism is recommended, and it is described in Section TOP-C model for non-trivial parallelism.

4.1 Trivial Parallelism

In parallel computation, perhaps 80% of the applications fall under the heading of ``trivial parallelism''. These are situations in which one must compute many unrelated cases. For example, perhaps 200 different cases must be computed and results returned for each case and each case has no interaction with any other one. In the absence of shared data, it would be common to start 20 GAP jobs on 20 distinct workstations in a student computer laboratory. If there are 200 cases, then one writes 20 different ``batch jobs'', each batch job handling 10 distinct cases.

ParGAP provides strong support for this common situation. Conceptually, one is always talking to ParGAP on a master process (the local workstation), and the master process talks to each of various slave processes. In its simplest form, one merely generates a list of inputs for the cases, and writes a suitable function to provide the case information. Effectively, trivial parallelism can be expressed by a parallel version of GAP's List() function (see List (for a list (and a function)) in the Reference Manual), which is called ParList() (see ParList) in ParGAP.

The following is an example of a ParGAP session that uses the ParList() function. Observe the use of BroadcastMsg() (see BroadcastMsg) to ensure that the function AnalyzePrimitivePermGroupsOfOrder() is known to the slaves.

gap> AnalyzePrimitivePermGroupsOfOrder := function(ord)
>      #returns a list of data records of all primitive 
>      #permutation groups of order ord that are represented 
>      #as permutation groups in GAP's primitive group database
>      
>      local PrimitiveGroupsOfOrder, AnalyzePrimitivePermGroup;
>    
>      PrimitiveGroupsOfOrder :=
>        ord -> List( [1..NrPrimitiveGroups(ord)],
>                     i -> PrimitiveGroup(ord,i) );
>    
>      AnalyzePrimitivePermGroup :=
>        grp -> rec( order := Size(grp),
>                    degree := NrMovedPoints(grp),
>                    baseSize := Size(BaseOfGroup(grp)) );
>    
>      return List( Filtered( PrimitiveGroupsOfOrder(ord),
>                             IsPermGroup ),
>                   AnalyzePrimitivePermGroup );
>    end;;
gap> #Define AnalyzePrimitivePermGroupsOfOrder on slaves
gap> BroadcastMsg( 
>      PrintToString("AnalyzePrimitivePermGroupsOfOrder := ",
>                    AnalyzePrimitivePermGroupsOfOrder ) );
gap> ParList( [2..256], AnalyzePrimitivePermGroupsOfOrder );
[ [ rec( order := 2, degree := 2, baseSize := 1 ) ], 
  [ rec( order := 3, degree := 3, baseSize := 1 ), 
  <...many lines of output omitted...>

As one expects, the answer is computed in parallel in a time that decreases linearly with the number of processors. Hence a Beowulf cluster of 16 processors (1 ParGAP master and 15 ParGAP slaves) will return this information approximately 15 times as fast as a single processor, greatly facilitating interactive exploration of the properties of various groups.

If each single case can be executed quickly, then the execution time for ParList() can be dominated by the network overhead imposed by message passing between the master and slaves. In this case, an optional third parameter to ParList() specifies the number of cases to be submitted to a slave in a single message (modulo how many cases are left, of course). As an example, suppose we are interested in finding those of the first 100,000 integers that have many distinct prime factors, but to reduce the network overhead we wish to batch 1000 cases at a time to the slaves. Then one is able to execute the following.

gap> ParList([1..100000],
>            i -> Length(PrimePowersInt(i))/2, 1000);

4.2 Using ParGAP interactively

Having seen an example where ParGAP can be useful, one next needs to know what is involved in setting up and using ParGAP. In the following, recall that ParGAP employs a master-slave architecture, with the local process being the ``master'' process, and all others being ``slaves''.

ParGAP is installed in the same way as other GAP packages. After extracting the ParGAP files, one invokes configure and make. (See Section Installing ParGAP for the details.) Instead of invoking gap, one then invokes the supplied shell script, pargap.sh, (possibly modified and/or renamed; again see Section Running ParGAP for the details). Observe that unlike most other GAP packages, one should not call LoadPackage() function from within a GAP session to start ParGAP. As usual, master and slave processes read a .gaprc file in the home directory, if present.

The method by which slave processes are started depends on the MPI implementation used (see Section Running ParGAP). When using a system MPI library, the system MPI launcher should be used to start ParGAP, and this also starts up the slaves. One commonly-supported MPI launcher syntax is

mpiexec -n num pargap.sh

where num is the total number of copies of pargap.sh that you wish to run. If, instead, ParGAP is built to use MPINU, the subset MPI implementation included with ParGAP, then you should invoke pargap.sh directly, and ParGAP will start up the slave processes itself. For details of these, it looks for a file named procgroup file in the current directory, but can be directed to use a different file through the -p4pg switch. Hence, if one always uses ParGAP with the slave processes specified in myprocgroup.big, one may wish to set up an alias or else a symbolic link to a script containing the equivalent of:

pargap.sh -p4pg PATH/myprocgroup.big

Observe also that heterogeneous computing is supported, in that each slave machine may run a different operating system on different hardware.

Within ParGAP, the standard message-passing commands (send, receive, probe, etc.) are available. The slaves are numbered consecutively starting at 1. If no slave parameter is supplied for SendMsg() or SendRecvMsg(), then the default slave is 1. Most other commands take a default slave parameter, MPI_ANY_SOURCE (meaning ``ANY_SLAVE''). PingSlave() sends a ``null message'', and waits for an acknowledgement that the slave is alive.

The command SendMsg(string, i); sends the (string) message string to slave i. Slave i then evaluates string as a GAP command, and returns an answer. If a command has no answer, (such as "x:=3;"), then the string "no_return_val" is returned. If multiple commands are sent, such as "x:=3; y:=4; x+y;", then only the last return value is returned. Note that the final semicolon, normally required by GAP, is optional in a command string for a message.

Most of the other commands are clear from context. Every SendMsg() to a slave must be balanced by a RecvMsg(). The exception to this rule is that, when using MPINU, the commands FlushAllMsgs() and ParReset() are available and may be used to flush any pending messages from a slave. A SendRecvMsg(string, i) command is equivalent to the consecutive commands SendMsg(string, i) and RecvMsg(i). The command BroadcastMsg() sends its string argument to each slave, and there is no reply message. ParEval() is like BroadcastMsg(), but it also evaluates on the master process. If one is unsure whether there is a pending message, one can invoke ProbeMsgNonBlocking() or ProbeMsg(); if no slave parameter is provided, the default is MPI_ANY_SOURCE. The command ParRead() is a parallel version of GAP's Read() command; it is useful for ensuring that the slaves and master share a similar environment. Other examples of commands that are analogous to GAP's built-in commands include: ParReread, ParEval, ParInstallGlobalFunction and ParInstallGlobalFunction. See Section Slave Listener Commands for more complete descriptions of the above commands.

An illustration of the usage of the commands mentioned above with a sample ParGAP session, for which a procgroup file has defined two slaves is the first example given in Section Extended Example. It's recommended that you review that example at this point.

4.3 Streaming

Next, we demonstrate a simple implementation of ``streaming'' as an example of how easy it is to use the ParGAP tools to provide new functionality. To the best of this author's knowledge, the term ``streaming'' originated with Charles Leedham-Green at the conference on which this proceedings is based, where he made a general request for streaming functionality to be implemented in some software system.

The idea of streaming is that one may have two algorithms for solving a problem. One would like to to solve the problem simultaneously on two processors, each using a distinct algorithm. Whichever processor finds the solution first should report back to the master process. The master process should then interrupt the other slave, so as to make it available for further computation. (NB Resetting of slaves is only possible with the MPINU library.) This is useful when one has a heuristic available that may finish early with the correct answer, or may continue for a very long time. The heuristic can then be ``streamed'' alongside the standard algorithm, in full confidence that the standard algorithm will provide a reasonable upper bound on the time to compute an answer.

One way to provide ``streaming'' functionality is by the implementation of a function we describe below that is inspired by GAP's First() function (see First in the GAP Reference Manual). The function First() takes a list and boolean function as arguments, and returns the first element of the list for which the boolean function returns true. In contrast, we define a corresponding ParGAP function, ParFirstResult(), which takes a list and a (general) function as arguments. Messages are sent to the slaves causing the ith slave to evaluate the function on the ith list element. The value returned is the value obtained from whichever slave finishes first. Note that in consistency with the goal of streaming, the function signals an error if asked to evaluate a list with more entries than the number of slaves.

gap> ParFirstResult := function( list, fnc )
>   local i, result;
>   if Length(list) > TOPCnumSlaves then
>     Error("too few slaves");
>   fi;
>   for i in [1..Length(list)] do
>     SendMsg( PrintToString(
>                   "fnc :=", fnc, "; fnc(", list[i], ");"),
>              i );
>   od;
>   result := RecvMsg(); # default is MPI_ANY_SOURCE
>   ParReset(); # Interrupt all other slaves - only if using MPINU
>   return result;
> end;;

We now demonstrate the use of ParFirstResult() in ``streaming''. For our example we need the FactInt package by Stefan Kohl. To get this package if you don't have it, visit http://www.cip.mathematik.uni-stuttgart.de/~kohlsn/factint.html

or the equivalent at one of GAP's mirror sites, and follow the easy installation instructions.

If one hasn't already included a LoadPackage("factint"); statement in one's .gaprc file then it is necessary to do:

gap> ParEval("LoadPackage(\"factint\")");

Loading FactInt 1.5.2 (Routines for Integer Factorization )
by Stefan Kohl, kohl@mathematik.uni-stuttgart.de

true

so that each slave (not just the master) is aware of the FactInt functions.

Above we defined ParFirstResult() on the master process. We will assume that we have two slaves.

gap> StreamingFactInt := function(i, x)
>   local alg;
>   alg :=
>    [ x -> [ "MPQS ALGORITHM",  FactorsMPQS(Factorial(x)+1) ],
>      x -> [ "CFRAC ALGORITHM", FactorsCFRAC(Factorial(x)+1) ]
>    ];
>   return alg[i](x);
> end;;

Both StreamingFactInt(1, x); and StreamingFactInt(2, x); factor an integer, but one uses the multiple polynomial quadratic sieve algorithm (MPQS), and the other uses the continued fraction algorithm (CFRAC); the functions FactorsMPQS and FactorsCFRAC that perform these algorithms are defined by the FactInt package. We demonstrate the ``streaming'' of these algorithms in determining the factorization of 35!+1.

gap> # Now, define StreamingFactInt() on each of the slaves.
gap> BroadcastMsg( PrintToString( "StreamingFactInt :=",
>                                 StreamingFactInt ) );
gap> ParFirstResult([1,2], i->StreamingFactInt(i, 35) );
... resetting ...
[ "MPQS ALGORITHM", [ 137, 379, 17839, 340825649, 32731815563800396289317 ] ]

The ParReset() function to reset the other slaves is only available if ParGAP is built to use MPINU. If you are using a system MPI library then the other slaves will continue with their algorithm, and you will need to wait for them to complete and return a result before reusing them.

4.4 TOP-C model for non-trivial parallelism

There are many examples where trivial parallelism does not suffice. Typically, this happens either when there are global variables that must be ``read'' by each slave process, and possibly updated, or else when the input for the next slave process depends on the result that arrived from the last slave process. Here we provide only the basics of the parallel model. The parallel model was described in more detail in Coo97, and a still more detailed description is contained in Chapter Basic Concepts for the TOP-C model (MasterSlave).

For non-trivial parallelism, ParGAP uses the TOP-C model Coo96. ParGAP typically invokes the TOP-C model via a command such as

MasterSlave( GenerateTaskInput, DoTask, CheckTaskResult,
UpdateSharedData );

The four arguments, GenerateTaskInput, DoTask, CheckTaskResult, and UpdateSharedData are ``callback'' functions written by the application programmer, and the names of those callback functions are arbitrary. (The manual for ParGAP/MPI, the earlier incarnation of ParGAP used slightly different names for its examples, but the purpose of the arguments of MasterSlave() has not changed -- only the naming convention has.)

The only ParGAP command above is MasterSlave(). A task is an arbitrary function (here called DoTask()) executed on a slave process, that takes input from the master process and returns its output to the master process. The diagram in Section Other TOP-C Commands gives some idea of the flow of control as a task is processed. The diagram there is meant to represent the three main abstractions of the TOP-C model: (1) the task, (2) the shared data, and (3) the action. The shared data consists of any globally shared data (which should be readable by all processes). The task is a procedure executed on a slave process that takes a task input, and the shared data, and produces a task output, which depends only on the task input and shared data. Finally, the task input and task output are sent to the master process, which must then decide upon an action.

Typical actions are NO_ACTION (and the master process might save the task output in a private list of results), UPDATE_SHARED_DATA (send a message to all slaves to update the local copy of the globally shared data), and REDO (re-do the computation in case the shared data was changed by another slave process). (Earlier incarnations of ParGAP used UPDATE. Now the alternative UPDATE_SHARED_DATA is offered and we standardize on this here.)

An example invocation of MasterSlave() is shown below, where we pass the four application functions as direct arguments of MasterSlave(). The routine below implements a simplified version of the ParList() function described in Section Trivial Parallelism.

ParInstallTOPCGlobalFunction( "MyParList",
function( list, fnc )
  local result, iter;
  result := [];
  # Each invocation of GAP's iterator
  # returns next element of list.
  iter := Iterator(list);
  MasterSlave(
      # GenerateTaskInput():
      function() if IsDoneIterator(iter) then return NOTASK;
                 else return NextIterator(iter); fi; end,
      # DoTask():
      fnc,
      # CheckTaskResult():
      function(input,output) result[input] := output; 
                             return NO_ACTION; end,
      # UpdateSharedData():
      Error # We never see action:  UPDATE_SHARED_DATA
      );
  return result;
end );

The function ParInstallTOPCGlobalFunction() installs MyParList on all ParGAP processes. It also defines the version of MyParList() on the master differently from on a slave, so that a call, MyParList([1..100], x->x^2);, on the master automatically causes MyParList([1..100], x->x^2); to be invoked on the slave with the same arguments. This is required for the TOP-C model. Note that the distinct function, ParInstallGlobalFunction(), exists for the equivalent of BroadcastMsg( "InstallGlobalFunction( MyParList, function ... end)");. Both functions should be called only from the master (possibly from inside a Read() command).

The shared data consists only of the variable fnc, which is read by all processes, but in this case is never ``updated''. Note that one need not explicitly declare variables that are in the shared data. A TOP-C shared data variable is defined as a variable whose value is read by more than one process, and which is modified only through a call to the application routine UpdateSharedData(). If we would like to see the messages, as they are passed back and forth between master and slave, we can optionally set the variable ParTrace (see ParTrace). We are then ready to execute our new parallel function.

gap>  ParTrace := true;;
gap>  MyParList([2..256], AnalyzePrimitivePermGroupsOfOrder);

The example above in fact requires only trivial parallelism. Hence the CheckTaskResult() parameter to MasterSlave() is especially simple. Since the action is never REDO, we achieve the maximum concurrency among slave processes, resulting in a speedup that is linear in the number of slaves.

In general, the parallel efficiency of an application, with its concommitant decisions about concurrency of slave tasks are typically determined by the actions chosen by CheckTaskResult(). The sample below is a standard ParGAP ``idiom'' that allows one to easily set up reasonable concurrency in a parallel application.

CheckTaskResult := function( taskInput, taskOutput )
  if taskOutput = fail then return NO_ACTION;
  elif not IsUpToDate() then return REDO_ACTION;
  else return UPDATE_ACTION;
  fi;
end;

The function IsUpToDate() is a boolean function provided by ParGAP that returns true if there has been no UPDATE_SHARED_DATA action since the time that the ``corresponding'' task input was generated by a call to the user-provided function GenerateTaskInput(). Otherwise, IsUpToDate() returns false. The ``corresponding'' task input is the task input associated with that task output which was most recently seen on the master process. In the context of the user-provided function CheckTaskResult(), it is the first argument to that function.

[Up] [Previous] [Next] [Index]

ParGAP manual
November 2013