[Up] [Next] [Index]

1 Writing Parallel Programs in GAP Easily

Sections

  1. Overview of ParGAP
  2. Choosing an MPI Library
  3. Installing ParGAP
  4. Running ParGAP
  5. Extended Example
  6. Author
  7. Invoking ParGAP with Remote Slaves (when using a system MPI library)
  8. Invoking ParGAP with Remote Slaves (when using MPINU)
  9. Problems Installing or Invoking ParGAP
  10. Problems Running ParGAP with MPINU
  11. Problems Running ParGAP with a System MPI Implementation
  12. Problems with Passwords (Getting Around Security)
  13. Modifying the GAP kernel

The ParGAP (Parallel GAP) package provides a way of writing parallel programs using the GAP language. Former names of the package were ParGAP/MPI and GAP/MPI; the word MPI refers to Message Passing Interface, a well-known standard for parallelism. ParGAP is based on the MPI standard, and this distribution includes a subset implementation of MPI, to provide a portable layer with a high level interface to BSD sockets. Since knowledge of MPI is not required for use of this software, we now refer to the package as simply ParGAP. For more information visit the author's ParGAP home page at: http://www.ccs.neu.edu/home/gene/pargap.html

For some background reading, see Coo95 and Coo97.

This first chapter is intended to help a new user set up ParGAP and run through some quick examples: see

The later chapters present detailed explanations of the facilities of ParGAP. Because parallel programming is sufficiently different from sequential programming, this author recommends printing out at least Chapters 1 through MasterSlave Tutorial, and skimming through those chapters for areas of interest, before returning to the terminal to try out some of the ideas. This document can be found in .../pkg/pargap/doc/manual.dvi of the software distribution. You may also want to print the index at the end of manual.dvi. In particular, the heading example in the index, or ??example from within GAP, should be useful. If you prefer postscript, the UNIX command dvips will convert that file to postscript form.

The development of ParGAP was partially supported by National Science Foundation grants CCR-9509783 and CCR-9732330.

1.1 Overview of ParGAP

ParGAP is installed on top of an existing GAP installation. It comes with its own subset MPI implementation (currently functional only on UNIX installations), or it can use your system MPI libraries, if present. See Section Installing ParGAP for instructions on installation of ParGAP. At the time that ParGAP is invoked, a special file or command line parameter must be used to tell ParGAP how many local processes or which remote machines to use for slave processors. See section Running ParGAP for instructions on invoking ParGAP. If there are questions or bugs concerning ParGAP, please write to: gene@ccs.neu.edu

If one wishes only to try out the parallel features, the first five pages of this manual (through the section on the slave listener) will suffice for installation, and using it. For the more advanced user who wishes to design new parallel algorithms or port old sequential code to a parallel environment, it is strongly recommended to also read the sections following on from Section Basic Concepts for the TOP-C model (MasterSlave).

ParGAP should be invoked via the script bin/pargap.sh created by the installation process which invokes GAP_ROOT_DIR/bin/ARCH/pargapmpi, where ARCH depends on your system but is the same directory in which the gap binary is found. MPI and the higher layers will not be available if the binary is invoked in the standard way as gap. This is a feature, since a single binary and source distribution serves both for the standard GAP and for ParGAP.

ParGAP is implemented in three layers: 1) MPI, 2) Slave Listener, and 3) Master Slave (TOP-C abstraction). Most users will find that the two highest layers (Slave Listener and Master Slave) meet all their needs.

1) MPI:
The lowest layer is MPI. Most users can ignore this layer. MPI is a standard for message-based parallel computation. A subset of the original MPI commands is exposed at the GAP level. The syntax is modified from the original C binding to make a GAP binding in an interpreted environment more convenient. This includes default arguments, useful return values, and Error break in the presence of errors. MPI_Init() (see MPI_Init) and MPI_Finalize() (see MPI_Finalize) are invoked automatically by ParGAP.

The MPI layer is not documented, since most users will not be using it. From GAP level, you can type: MPI_tabtab to see all implemented MPI functions and variables. However, typing the symbol name alone (e.g.: MPI_Send; ) will cause it to display the calling syntax. The same information is displayed after an incorrect call. The return value is typically obvious. MPI is implemented in src/pargap.c. ParGAP will use a sysem MPI implementation if one is present, and the distribution also includes two versions of a simple, subset implementation of MPI in pkg/gapmpi/mpinu/ and pkg/gapmpi/mpinu2/, which is implemented on top of a standard sockets interface, which can be used instead..

For those who wish to directly use the MPI interface, the meanings of the MPI calls are best found from the standard MPI documentation:

MPI Forum: http://www.mpi-forum.org/

MPI Standard (version 1.1): http://www.mpi-forum.org/docs/mpi-11-html/mpi-report.html

UNIX style man pages: http://www.mcs.anl.gov/research/projects/mpi/www/

2) Slave Listener:
This layer provides basic message passing facilities for communication among multiple ParGAP processes in a form that is more convenient for programming than the lower MPI layer. This will be the most useful entry point to ParGAP for most users. This is the default mode for ParGAP. Each remote (slave) process is in a receive-eval-send loop, in which the slave receives a GAP command from the local or master, the slave evaluates the GAP command, and the slave then sends the result back to the master as a GAP object.

Almost all commands in the slave listener are of the form *Msg* e.g. SendMsg() (see SendMsg), RecvMsg() (see RecvMsg), ProbeMsg() (see ProbeMsg). Since the slave is in a receive-eval-send loop, every SendMsg(cmd) on the master must be balanced by a later RecvMsg(). SendRecvMsg() (see SendRecvMsg) is provided to combine these steps. A few parallel utilities are also included, such as ParRead() (ParRead), ParList() (ParList), ParEval() (ParEval), etc.

Messages are arbitrary GAP objects. Note that arguments to any GAP function are evaluated before being passed to the function. Hence, any argument to SendMsg() or ParEval() would be evaluated locally before being sent across the network. For this reason, arguments can also be given as strings, to delay evaluation until reaching the destination process. Hence, real strings must be quoted: ParEval("x:="abc";"); Additionally, multiple commands are valid, and the final ``;'' of the string is optional. So, one can write:

BroadcastMsg("x:=\"abc\"; Print(Length(x), \"\\n\")");;

A full description is contained in Chapter Slave Listener.

3) Master Slave:
The Master Slave facility is provided both for writing complex parallel software, and as an easier way to parallelize previous or ``legacy'' sequential code. While the Slave Listener may be sufficient for simple parallel requirements, more complex software requires a higher level abstraction. The fundamental abstractions of the master slave layer are the task and the shared data.

1)
The task typically corresponds to the procedure or inner body of a loop in a sequential program. This is the part that must be repetitively computed in parallel.

2)
The shared data typically corresponds to the data of a sequential program that is not within the local scope of the task. Often this is a global data structure. In the case that the task is the inner body of a loop, the shared data may be a local data structure that is outside the local scope of the loop.

It is usually quite easy to identify the task and the shared data of a sequential program or algorithm, which is the first step in parallelizing an algorithm.

The Master Slave parallel model described here has also been successfully used in C and in LISP. It has been used both in distributed memory and shared memory environments, although this version in GAP currently works only in a distributed environment. In the C language, this parallel model is known as TOP-C (Task Oriented Parallel C). For examples of the use of the TOP-C model see Coo98, CCHW02, CFTY94, CG02, CH97, CHLM97, CLMW96, and CT96.

While no parallel software can eliminate the problem of designing an algorithm that is efficient in a parallel environment, the TOP-C abstraction eases the job by eliminating programmer concerns about lower level details, such as message passing, migration and replication of data, load balancing, etc. This leaves the programmer to concentrate on the primary goal: maximizing the concurrency or parallelism.

1.2 Choosing an MPI Library

If you are using Linux and wish to try out ParGAP quickly, you can skip this section and let the ParGAP build process choose an MPI library for you. If you have a little more time, or are running on a different system, please read on.

ParGAP uses MPI, a standard Message Passing Interface for communicating between processes. Since the details of inter-process communication are system-specific, ParGAP relies on an external library to provide its MPI functions. A implementation of a sufficient subset of MPI, which runs on Linux and OS X, is included with ParGAP. Alternatively, an MPI library can be installed on your system before building ParGAP. Two popular MPI implementations are:

MPICH2 http://www.mcs.anl.gov/research/projects/mpich2/
Open MPI http://www.open-mpi.org/
Both of these are compatible with Linux, Macs and Windows. Installation packages can be downloaded from their websites, or may be available through your systems standard package management mechanism.

The MPINU library included with ParGAP provides the MPI functionality that ParGAP needs by using Unix sockets. This implementation is sufficient for basic ParGAP usage, but does not scale to larger systems as well as the alternative system libraries. It is better-suited to interative ParGAP sessions, since system MPI implementations can result in problems with line editing in ParGAP. When built with MPINU, ParGAP also enables two commands ParReset() and FlushAllMsgs() which can be useful when developing parallel programs. See Section Problems Running ParGAP with a System MPI Implementation for details of these known issues with system MPI implementations. Two versions of MPINU are included with ParGAP: the original MPINU and a newer version, called MPINU2.

On Linux machines, we recommend that you use ParGAP with a system MPI implementation instead of MPINU, if possible. These implementations provide better performance and fault tolerance, and are compatible with a wider range of operating systems and hardware, including high speed networks and proprietory high-end computing systems.

On Macs, we recommend using the original MPINU since there are currently some problems running ParGAP with both a system MPI implementation and MPINU2. Both these issues will hopefully be resolved in a future release.

By default, the ParGAP build process (see Section Installing ParGAP) tries to use a system MPI implementation if it can find one. If not, it will use MPINU. Two versions of MPINU are included with this release of ParGAP. The recommended choice is MPINU2, but the original MPINU is included as a backup in case there are problems building or running MPINU2.

1.3 Installing ParGAP

Installing ParGAP should be relatively simple. However, since there are many interactions both with the GAP kernel and with the UNIX operating system, in a minority of cases, manual intervention will be necessary. If you are part of this minority, please see the section Problems Installing or Invoking ParGAP. The most common problem is the local security policy; ParGAP is more pleasant to use when you don't have to manually provide the password for each slave. See section Problems with Passwords (Getting Around Security) for suggestions in this respect.

To install the ParGAP package, move the file pargap-XXX.zoo or pargap-XXX.tar.gz (for some version number XXX of ParGAP) into the pkg directory in which you plan to install ParGAP. Usually, this will be the directory pkg in the hierarchy of your version of GAP (in fact, currently it is not possible to have the pkg directory separate from GAP's pkg directory; we hope to remedy this in future versions of ParGAP so that it will also possible to keep an additional pkg directory in your private directories; section Installing a GAP Package of the GAP reference manual gives details on how to do this, when it's possible.)

Now change into the pkg directory in which you plan to install ParGAP. If you got a .zoo file, unpack it with:

unzoo -x pargap-XXX

If you got a .tar.gz file and your tar command supports the z option, unpack it with:

tar zxf pargap-XXX.tar.gz

or otherwise unpack in two steps with:

gunzip pargap-XXX.tar
tar xvf pargap-XXX.tar

Whether you got the .zoo or .tar.gz archive you should now have a new directory pargap. As for a generic GAP package, do:

cd pargap
./configure
make

This builds the ParGAP files. ParGAP also needs to rebuild parts of GAP to enable the MPI hooks. It may also need to re-run the GAP configure if you have a dedicated MPI compiler. By default, the ParGAP configure will prompt you to do this by hand if necessary, and then to restart the ParGAP build. If you are happy for the ParGAP build process to run the GAP configure for you if needed, with no arguments, then run ParGAP's configure with

./configure --with-basic-gap-configure

The configure script will attempt to find a system MPI implementation that it can use. If if not then it will use MPINU2, the more recent of the two MPINU subset implementations included with the ParGAP package. You can use the --with-mpi= configure option to specify a different behaviour, and you can also set your own MPI compiler and options if you wish. See the help text provided by ./configure -h for full details.

After doing the configure and make steps of ParGAP's installation process (see Section Installing ParGAP), you should find in ParGAP's bin subdirectory a script

pargap.sh

which you should use to start ParGAP. (ParGAP can not be started by starting GAP 4 in the usual way, and using LoadPackage; doing so will result in Info-ed advice to read this section.) Edit the pargap.sh script if necessary, copy it to a standard path and rename it according to how you intend to call ParGAP (e.g. rename it: pargap).

Note: The script pargap.sh defines the program that runs ParGAP as pargapmpi. In fact, after installation pargapmpi is a symbolic link to the GAP binary named gap. The same binary runs both GAP and ParGAP; when the binary is invoked as gap GAP runs in the usual way without any parallel features; only when the binary is invoked as pargapmpi are the parallel features incorporated. See Section Modifying the GAP kernel for more details.

Your ParGAP should now be ready to use. Now read the next section which decribes how to run ParGAP (if you are reading this from GAP's on-line help, type: ?>).

1.4 Running ParGAP

After a successful build, you will see a message saying that ParGAP is ready to use, and confirmation of whether a system MPI library or MPINU will be used. The method of running ParGAP depends on this MPI choice, and the MPI library is auto-detected, or can be specified, in configure, as described in Section Installing ParGAP. The pros and cons of the two different library variants are discussed in Section Choosing an MPI Library.

We will assume that you have copied the pargap.sh script to a location on your search path and renamed it as pargap, as suggested in Section Installing ParGAP.

If you are using a system MPI library: ParGAP should be started using an MPI launcher script. The name and syntax of the command to start MPI processes can vary, and you should check your system MPI documentation for details. However, one common launcher is mpiexec, and the following command should work with both Open MPI and MPICH, and most other MPI-2 implementations:

mpiexec -n 3 pargap

This will start three copies of the ParGAP: one master and two slaves. These processes will all run on your local machine. See Section Invoking ParGAP with Remote Slaves (when using a system MPI library) for how to configure and run processes on remote slaves.

If you are using MPINU: In ParGAP's bin subdirectory you should find a procgroup file which defines the master and slave processes that will be used by ParGAP. When ParGAP is started, the MPINU library looks for a file called procgroup in the current directory, unless the -p4pg option is used. Thus if you renamed your shell script pargap, the following are valid ways of starting ParGAP:

pargap

(if current directory contains the file: procgroup), or

pargap -p4pg myprocgroupfile

(where myprocgroupfile is the complete path of your procgroup file -- there is no restriction on how you name it). The default procgroup file defines one master and two slaves on the local machine. For instructions of how to run remote slaves, see Section Invoking ParGAP with Remote Slaves (when using MPINU).

If you had trouble installing or starting ParGAP, see the section Problems Installing or Invoking ParGAP. Otherwise you are ready to test your installation, Try the example in the following section (if you are reading this from GAP's on-line help, type: ?>).

1.5 Extended Example

After installation, try it out. Invoke ParGAP as described in Section Running ParGAP and try the example below (but substitute your own program where you see "/home/gene/myprogram.g"). The commands in this first example are also found in the README file. So, you may wish to copy text from the README file and paste it into a ParGAP session. If you have not specified any additional machines to the MPI launcher, or you are using the unmodified procgroup file, then your remote slaves will be other processes on your local machine. It is a good idea to run only on your local machine for your first experiments and while you are debugging parallel programs. When you wish to experiment with using remote machines, you can then proceed to section Invoking ParGAP with Remote Slaves (when using a system MPI library) or section Invoking ParGAP with Remote Slaves (when using MPINU) depending on which MPI library ParGAP has been built to use.

gap> # This assumes your procgroup file includes two slave processes.
gap> PingSlave(1); #a `true' response indicates Slave 1 is alive
true
gap> # Print() on slave appears on standard output 
gap> # i.e. after the master's prompt.
gap> SendMsg( "Print(3+4)" );
gap> 7
gap> # A <return> was input above to get a fresh prompt.
gap> #
gap> # To get special characters (including newline: `\n')
gap> # into a string, escape them with a `\'.
gap> SendMsg( "Print(3+4,\"\\n\")" );
gap> 7

gap> # Again, a <return> was input above after the 7 and new-line
gap> # were printed to get a fresh prompt.
gap> #
gap> # Each SendMsg() is normally balanced by a RecvMsg().
gap> SendMsg( "3+4", 2);
gap> RecvMsg( 2 );
7
gap> # The following is equivalent to the two previous commands.
gap> SendRecvMsg( "3+4", 2);
7
gap> # The two SendMsg() commands that were sent to Slave 1 earlier have
gap> # responses that are waiting in the message queue from that slave.
gap> # Check that there is a message waiting. With some MPI implementations
gap> # the message is not immediately available, but when ProbeMsg() does
gap> # return true then RecvMsg() is guaranteed to succeed. 
gap> ProbeMsgNonBlocking( 1 );
false
gap> ProbeMsgNonBlocking( 1 );
true
gap> # Print() is a `no-value' functions, and so the result of a RecvMsg() 
gap> # in both these cases is "<no_return_val>".
gap> RecvMsg( 1 );
"<no_return_val>"
gap> RecvMsg( 1 );
"<no_return_val>"
gap> # As with Print() the result of Exec() appears on standard
gap> # output, and the result is "<no_return_val>".
gap> SendRecvMsg( "Exec(\"pwd\")" ); # Your pwd will differ :-)
/home/gene
"<no_return_val>"
gap> # Define a variable on a slave
gap> SendRecvMsg( "a:=45; 3+4", 1 );
7
gap> # Note "a" is defined on slave 1, not slave 2.
gap> SendMsg( "a", 2 ); # Slave prints error, output on master
gap>  Variable: 'a' must have a value
gap> # <return> entered to get fresh prompt.
gap> RecvMsg( 2 ); # No value for last SendMsg() command
"<no_return_val>"
gap> RecvMsg( 1 );
45
gap> # Execute analogue of GAP's List() in parallel on slaves.
gap> squares := ParList( [1..100], x->x^2 );
[ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 
  289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 
  900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 
  1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 
  2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 
  3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 
  5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 
  7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 
  9216, 9409, 9604, 9801, 10000 ]
gap> # Send a large, local (non-remote) data structure to a slave
gap> Concatenation("x := ", PrintToString([1..10]*2));
"x := [ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 ]\n\000"
gap> SendMsg( Concatenation("x := ", PrintToString([1..10]*2)) ); 
gap> RecvMsg();
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 ]
gap> # Send a local (non-remote) function to a slave
gap> myfnc := function() return 42; end;;
gap> # Use PrintToString() to define myfnc on all slave processes
gap> BroadcastMsg( PrintToString( "myfnc := ", myfnc ) );
gap> SendRecvMsg( "myfnc()", 1 );
42
gap> # Ensure problem shared data is read into master and slaves.
gap> # Try one of your GAP program files instead.
gap> ParRead( "/home/gene/myprogram.g");

Now that you have done a fairly rudimentary test of ParGAP you should be ready to do something a little bit more interesting:

gap> ParInstallTOPCGlobalFunction( "MyParList",
> function( list, fnc )
>   local result, iter;
>   result := [];
>   iter := Iterator(list);
>   MasterSlave( function() if IsDoneIterator(iter) then return NOTASK;
>                           else return NextIterator(iter); fi; end,
>                fnc,
>                function(input,output) result[input] := output;
>                                       return NO_ACTION; end,
>                Error
>              );
>   return result;
> end );
gap> MyParList( [1..25], x->x^3 );
master -> 1:  1
master -> 2:  2
2 -> master: 8
1 -> master: 1
master -> 1:  3
master -> 2:  4
2 -> master: 64
1 -> master: 27
master -> 1:  5
master -> 2:  6
2 -> master: 216
1 -> master: 125
master -> 1:  7
master -> 2:  8
2 -> master: 512
1 -> master: 343
master -> 1:  9
master -> 2:  10
2 -> master: 1000
1 -> master: 729
master -> 1:  11
master -> 2:  12
2 -> master: 1728
1 -> master: 1331
master -> 1:  13
master -> 2:  14
2 -> master: 2744
1 -> master: 2197
master -> 1:  15
master -> 2:  16
2 -> master: 4096
1 -> master: 3375
master -> 1:  17
master -> 2:  18
2 -> master: 5832
1 -> master: 4913
master -> 1:  19
master -> 2:  20
2 -> master: 8000
1 -> master: 6859
master -> 1:  21
master -> 2:  22
2 -> master: 10648
1 -> master: 9261
master -> 1:  23
master -> 2:  24
2 -> master: 13824
1 -> master: 12167
master -> 1:  25
1 -> master: 15625
[ 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744, 3375, 
  4096, 4913, 5832, 6859, 8000, 9261, 10648, 12167, 13824, 15625 ]
gap> ParInstallTOPCGlobalFunction( "MyParListWithAglom",
> function( list, fnc, aglomCount )
>   local result, iter;
>   result := [];
>   iter := Iterator(list);
>   MasterSlave( function() if IsDoneIterator(iter) then return NOTASK;
>                           else return NextIterator(iter); fi; end,
>                fnc,
>                function(input,output)
>                  local i;
>                  for i in [1..Length(input)] do
>                    result[input[i]] := output[i];
>                  od;
>                  return NO_ACTION;
>                end,
>                Error,  # Never called, can specify anything
>                aglomCount
>              );
>   return result;
> end );
gap> MyParListWithAglom( [1..25], x->x^3, 4 );
master -> 1: (AGGLOM_TASK): [ 1, 2, 3, 4 ]
master -> 2: (AGGLOM_TASK): [ 5, 6, 7, 8 ]
1 -> master: [ 1, 8, 27, 64 ]
2 -> master: [ 125, 216, 343, 512 ]
master -> 1: (AGGLOM_TASK): [ 9, 10, 11, 12 ]
master -> 2: (AGGLOM_TASK): [ 13, 14, 15, 16 ]
1 -> master: [ 729, 1000, 1331, 1728 ]
2 -> master: [ 2197, 2744, 3375, 4096 ]
master -> 1: (AGGLOM_TASK): [ 17, 18, 19, 20 ]
master -> 2: (AGGLOM_TASK): [ 21, 22, 23, 24 ]
1 -> master: [ 4913, 5832, 6859, 8000 ]
2 -> master: [ 9261, 10648, 12167, 13824 ]
master -> 1: (AGGLOM_TASK): [ 25 ]
1 -> master: [ 15625 ]
[ 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744, 3375, 
  4096, 4913, 5832, 6859, 8000, 9261, 10648, 12167, 13824, 15625 ]

If you wish an accelerated introduction to the models of parallel programming provided here, you might wish to read the beginning of Chapter Slave Listener through section Slave Listener Commands, and then proceed immediately to Chapter Basic Concepts for the TOP-C model (MasterSlave).

1.6 Author

The ParGAP package was designed and written by Gene Cooperman, College of Computer Science, Northeastern University, Boston, MA, U.S.A.

If you use ParGAP to solve a problem then please send a short email to gene@ccs.neu.edu about it, and cite the ParGAP package as follows:

\bibitem[Coo99]{Coo99}
      Cooperman, Gene,
      {\sl Parallel GAP/MPI (ParGAP/MPI)}, Version 1,
      College of Computer Science, Northeastern University, 1999,
      \verb+http://www.ccs.neu.edu/home/gene/pargap.html+.

1.7 Invoking ParGAP with Remote Slaves (when using a system MPI library)

ParGAP can be built to use either a system MPI library, or the included MPINU library. The command to run ParGAP is different in the two cases. If ParGAP has been built using MPINU then you should skip this section and proceed to section Invoking ParGAP with Remote Slaves (using MPINU). Otherwise, please read on.

After ParGAP has been installed, a script bin/pargap.sh will have been created which (after any changes you needed to make; see Section Installing ParGAP) you should use to invoke ParGAP. Installers are encouraged to treat pargap.sh in analogy to gap.sh. For example, if your site has copied gap.sh to /usr/local/bin/gap, then you should also look for the pargap.sh script as /usr/local/bin/pargap. It simplifies the remoste slave configuration if ParGAP can be found on the standard path on each machine, and we'll assume that in this section ParGAP can be invoked simply as pargap.

When built with a system MPI installation, ParGAP must be invoked using the system's MPI launcher. This may go under several names, but the command name mpiexec is suggested in the MPI-2 specification, and is supported by both Open MPI and MPICH, two common implementations of that specification.

The basic usage is

mpiexec -n num pargap

to launch num copies of ParGAP (i.e. one master and (num−1) slaves). With no other parameters, these will all be launched on the host machine.

A configuration file can be used to specify hosts for remote slaves. The syntax of this file different for Open MPI and MPICH, but in both cases the configuration file is a text file listing the host names and the number of processes to run on each host, one per line. The default number of processes per node is one by default.

When using Open MPI, an example hostfile is

# Example Open MPI hostfile.  Comments begin with #
#
# The following node is a single processor machine:
foo.example.com
# The following two nodes are dual-processor machines:
bar.example.com slots=2
yow.example.com slots=2
This hostfile is passed to mpiexec using

mpiexec -n num -hostfile hostfile pargap

Processes are allocated round-robin style. For example, if we choose num to be seven then the first process (the master) will run on foo. The slaves will run two on bar, two on yow and a further one each on foo and bar.

When using MPICH, the equivalent machinefile is

# Example MPICH machinefile.  Comments begin with #
#
# The following node is a single processor machine:
foo.example.com
# The following two nodes are dual-processor machines:
bar.example.com:2
yow.example.com:2
and the command to start ParGAP using these hosts will be

mpiexec -n num -machinefile machinefile pargap

For further information, such as specifying hosts on the command line, or finer control of how processes are distributed between hosts, or if you have a different MPI implementation, then please see your MPI documentation.

Unless you have any problems with the installation or running ParGAP, you can skip the rest of this chapter and move on to Chapter Slave Listener.

1.8 Invoking ParGAP with Remote Slaves (when using MPINU)

If ParGAP has been built to use the supplied MPINU library then ParGAP includes the facility (on Linux) to start up and manage remove slaves without needing an external MPI launcher. If ParGAP is built using a system MPI library then please read to section Invoking ParGAP with Remote Slaves (when using a system MPI library) instead.

We'll assume that when ParGAP was built the scipt bin/pargap.sh was copied to /usr/local/bin/pargap (see Section Installing ParGAP). ParGAP can then be run by calling pargap. In addition, there must be a file, procgroup, in the current directory, or alternatively, if you wish to use a single procgroup file for all jobs, and that procgroup file is in /home/joe, then you can alias pargap to pargap -p4pg /home/joe/procgroup.

The procgroup file has a simple syntax, taken from the MPICH (not MPICH2) implementation of MPI. A # in column 1 introduces a comment line. The first non-comment line should be local 0, verbatim. This line declares the master process as the local process. Other lines are of the form:

host-machine 1 pargap-script

e.g.

regulus.ccs.neu.edu 1 /usr/local/bin/pargap

The first field is the hostname for a remote process. The second field specifies one thread per process. (ParGAP recognizes only the value 1 for the second field.) The third field is an absolute pathname for ParGAP, as it would be called on the remote process. Note that you can repeat the same line twice if you want two remote ParGAP processes on the same processor. The default procgroup provided in the distribution will have lines of form:

localhost 1 path-of-provided-pargap.sh

If you change path-of-provided-pargap.sh to just, say, pargap, this will work only if pargap is in your path on the remote machine shell (localhost in this case), using your default shell. On most machines, localhost is an alias for the local processor. This is a good default for debugging, so that you don't disturb users on other machines.

MPI will use a line

host-machine 1 pargap-script

to create a UNIX subprocess executing:

ssh host-machine pargap-script

Suppose host-machine is regulus.ccs.neu.edu and pargap-script is /usr/local/bin/pargap as in the above example, and we were to have trouble invoking ParGAP, then it would be a good idea to try invoking ssh regulus.ccs.neu.edu from a UNIX prompt and if that succeeds, to then try executing the full ssh command.

A typical problem is that the remote processor requires a password to login. MPI requires a login without passwords. This can be set up for ssh. See man ssh. Sometimes, PAM is also used for user authentication (see /etc/pam.conf). Consult your system staff for further analysis. If your site uses an alternative to ssh, there is a solution here: add the lines

#############################################################################
##
##  SSH . . . .. . . . . . . . . . . . . . . . .  remote shell used by ParGAP
##
##
SSH=myssh
export SSH

before the GAP block with the exec line. (Of course, the # lines are not needed; they are comments.)

Note that the remote ParGAP process will not read from standard input, although signals such as SIGINT (^C) may be received by the remote process. However, the remote ParGAP process will write to standard output, which is relayed to the local process. So,

gap> SendMsg("Exec(\"hostname\")", 2);

will execute and print from the remote process.

1.9 Problems Installing or Invoking ParGAP

If you still have problems, here is a list of things to check. This section considers general problems when installing or running ParGAP. The two sections after this one consider problems specific to using MPINU or a system MPI library respectively.

0.
If you are using ParGAP on a Mac with MPINU2 or a system MPI implementation then ParGAP may consistently crash on startup. If this is the case then try using MPINU instead by reconfiguring ParGAP with

./configure --with-mpi=MPINU

This is a known issue which will be fixed in a forthcoming version.

1.
Do you have enough swap space to support multiple GAP processes? A simple way to check this is with the UNIX command, top. The Linux version of top sorts by memory usage if you type M.

2.
make tries to automatically create:

pkg/pargap/bin/pargap.sh

and copy the parameters from GAP_ROOT/bin/gap.sh. GAP_ROOT was specified when you executed ./configure GAP_ROOT to install ParGAP. This can be error-prone if your site has an unusual setup. If you execute GAP_ROOT/bin/gap.sh, does gap come up? If so, compare it with pargap.sh and check for correct settings in .../pkg/pargap/bin/pargap.sh?

3.
Were the remote slave processes able to start up? If so, could they connect back to the master? To test connectivity problems, try manually starting a remote slave by executing a line in the script. Try a simple ssh remote-hostname to see if the issue is with security. If your site uses ssh instead of ssh, then there is a security issue. Read Section Problems with Passwords (Getting Around Security), and possibly man sshd.

4.
If the previous step failed due to security issues, such as requesting a password, you have several options. man ssh tells you the security model at your site. Then read Section Problems with Passwords (Getting Around Security).

5.
Is pargap listed in .../pkg/ALLPKG? [It's needed to autostart slaves.]

6.
Inside ParGAP, has MPI been successfully initialized? Try:

gap> MPI_Initialized();

7.
A remote (slave) ParGAP process starts in your home directory and tries to cd to a directory of the same name as your local directory. Check your assumptions about the remote machine. Try:

gap> SendRecvMsg("Exec(pwd)"); SendRecvMsg("UNIX_Hostname()");
gap> SendRecvMsg("UNIX_Getpid()");

8.
Every ParGAP slave process displays its GAP banner and startup messages on the terminal of the master process. If you have many slaves and do not wish to see these messages, then pass the -b and/or -q switches to ParGAP when it starts, to disable the banner or all messages respectively. See Section Ref:Command Line Options of the GAP Reference Manual for further details.

9.
Read the documentation for further possible problems.

1.10 Problems Running ParGAP with MPINU

If you have problems running ParGAP, and ParGAP is built to use the supplied MPINU library, then this section lists some things to check, in addition to the general issues listed in the previous section. If you are using a system MPI implementation instead of MPINU, this section can be ignored, but you should read the next section instead.

1.
Did ParGAP find your procgroup file? [It looks in the current directory for procgroup, or for:

... -p4pg PATH/procgroup

on the command line.]

2.
If you are using MPINU, is the procgroup file in your current directory set correctly? Test it. If you are calling it on a remote host, manually type:

ssh HOSTNAME ParGAP

where HOSTNAME and ParGAP appear exactly as in procgroup, e.g.

ssh denali.ccs.neu.edu /usr/local/gap4r3/bin/pargap.sh

In some cases, exec is used to save process overhead. Also try:

ssh HOSTNAME exec ParGAP

If you plan to call it on localhost, try just: ParGAP

Note that if not all the slave processes succeed in connecting to the master, then ParGAP writes out a file:

/tmp/pargapmpi-ssh.xx

where xx is replaced by the the process id of the ParGAP process.

3.
If the connection dies at random, after some period of time: You can experiment with SO_KEEPALIVE and variants. (See man setsockopt.) This periodically sends null messages so the remote machine does not think that the originating machine is dead. However, if the remote machine fails to reply, the local process sends a SIGPIPE signal to notify current processes of a broken socket, even though there might have been only a temporary lapse in connectivity. ssh specifies KeepAlive yes by default, but setting KeepAlive no might get you through some transient lapses in connectivity due to high congestion. You may also want to experiment with: setenv SSH "ssh -n"

4.
If a host is on multiple networks, it will have multiple IP addresses and usually multiple hostnames. In this case, the master process cannot always guess correctly which IP address (which internet address) should be passed to the slave process, so that the slave process can call back to the master. In such cases, you may need to tell ParGAP which hostname or IP address to use for the callback. This is done by setting the UNIX environment variable, CALLBACK_HOST, as in the example below.

# [ in sh/bash/... ]
CALLBACK_HOST=denali.ccs.neu.edu; export CALLBACK_HOST
# [ in csh/tcsh/... ]
setenv CALLBACK_HOST=denali.ccs.neu.edu

The appropriate line for your shell can be placed in your shell initialization file. Alternatively, you can set this up for all users by placing the Bourne shell version (for sh) somewhere between the first and last line of .../pkg/pargap/bin/pargap.sh.

5.
ParGAP is supplied with two different versions of MPINU: the original MPINU and a later version, MPINU2, and it will also work with other MPI libraries if they are present on your system. By default, if you do not have a system MPI implementation then MPINU2 is used. If you have problems which appear to be MPI-related, try rebuilding ParGAP with a different MPI library. For example, to use MPINU instead of MPINU2 then run configure using
./configure --with-mpi=MPINU

1.11 Problems Running ParGAP with a System MPI Implementation

Here are a list of known issues when using a system MPI library with ParGAP, and some solutions or workarounds. Not all of these issues will manifest themselves on all architectures and all MPI implementations. If you are having problems building or running ParGAP, you should check this section as well as Section Problems Installing or Invoking ParGAP

1.
Line editing at the GAP command prompt is unlikely to work when ParGAP is invoked with an MPI launcher, since they tend to do their own processing of the terminal I/O (stdin/stdout/stderr) which does not work well either the readline library used in newer versions of GAP or the in-built terminal editing in earlier versions of GAP. It may be useful to run ParGAP through the rlwrap utility, if available. For example, if ParGAP is run using mpiexec, then try

rlwrap mpiexec -n 3 pargap

This should restore some of the line editing, although tab completion is limited to commands that rlwrap has already seen you use. For more information, try man rlwrap.

2.
The command FlushAllMsgs() (see FlushAllMsgs) is not available when using a system MPI implementation, since it tests show that ProbeMsgNonBlocking(), which it uses (see ProbeMsgNonBlocking) cannot be relied upon to always return true the first time that it is called after a message has been sent. If your system MPI implementation does exhibit this desired behaviour for ProbeMsgNonBlocking() then you can install your own local copy of FlushAllMsgs() by copying the code for this function from lib/slavelist.g, removing the if statement and renaming the function.

3.
The command ParReset() (see ParReset) is not available when using a system MPI implementation. When using a MPINU library, the slaves are launched by ParGAP itself and so can be contacted and restarted, but with a system MPI library the slaves are launched by mpiexec (or whichever MPI launcher you use) and so cannot be reset from within ParGAP. There is no known workaround for this.

4.
GAP and, in particular, the IO Package install handlers for the SIGCHLD signal. Many implementations of MPI also install their own SIGCHLD handler, which may then conflict with ParGAP. Testing has revealed no issues, but we cannot guarantee that there will be no interaction between the two. In particular, this may result in temporary files not being cleaned up properly.

5.
The GAP memory manager, GASMAN, can run into problems extending the GAP workspace if external libraries use malloc to allocate their own memory. MPINU avoids the use of malloc as much as possible, but system MPI implementations may not be as careful. This can be resolved by starting ParGAP with the -s command-line switch, which asks ParGAP to pre-allocate memory before it starts. You can safely pre-allocate more memory than you will actually need since physical memory will only be mapped when it is actually used, so for example you could allocate 3Gb:
mpiexec -n 3 pargap -s 3g
The -a and -m switches can also be used to control memory usage. See Section Ref:Command Line Options of the GAP Reference Manuel for further information.

News of any other issues or solutions would be gratefully accepted.

1.12 Problems with Passwords (Getting Around Security)

There is a simple test to see if you need to read this section. Pick a remote machine, HOSTNAME, that you wish to execute on, and type: ssh HOSTNAME. If this did not work, also try ssh HOSTNAME. If you were asked for your password, then you and your system administrator may need to talk about security policy. If you were successful with an alternative to ssh then set the environment variable, SSH, to the alternative value, as described in item 3 below.

(1)
Add a .shosts file to your home directory (for ssh).

(2)
Hack around the problem: By default, the startup script uses ssh to start remote processes. However, if the environment variable SSH was set, the script uses the value of the environment variable instead of ssh. This may be useful, if you have your own script, myssh, that automatically gets around the security issues. Then just type:

SSH=myrsh; export SSH  # [ in sh/bash/... ]
setenv SSH myrsh       # [ in csh/tcsh/... ]

The appropriate line for your shell can be placed in your shell initialization file. Alternatively, you can set this up for all users by placing the Bourne shell version (for sh) somewhere between the first and last line of .../pkg/pargap/bin/pargap.sh. (The example for ssh was given earlier.)

(3)
ssh: man ssh mentions some possibilities for giving the password the first time, and then having ssh remember that future logins to that machine are authorized for the duration of the session. Don't overlook the use of $HOME/.ssh/config to set special parameters, such as specifying a different login name on the remote machine. Some parameters of interest might be KeepAlive, RSAAuthentication, UseRsh. You may also find useful information in man sshd.

(4)
After starting ParGAP, manually call

/tmp/pargapmpi-ssh.$$

and repeatedly type in the password for each slave process. If you find yourself doing this, you may want to talk with your system administrator, since it actually hurts system security to have you repeatedly typing passwords with a concommitant risk that someone else will find out your password.

1.13 Modifying the GAP kernel

Note that this package modifies the GAP src and bin files, and creates a new GAP kernel. This new GAP kernel can be shared by traditional users of the old, sequential GAP kernel, and by those doing parallel processing.

The GAP kernel will have identical behavior to the old GAP kernel when invoked through the gap.sh script or the bin/@GAParch@/gap binary. The new ParGAP variables will appear to the end user ONLY if the GAP binary was invoked as pargapmpi: a symbolic link to the actual GAP binary. The script, pargap.sh, does this.

So, in a multi-user environment, traditional users can continue to use gap.sh without noticing any difference. Only an invocation of pargap.sh will add the new features.

In a future version of GAP, it is hoped that the GAP kernel will have enough ``hooks'', so that no modification of the GAP kernel is required. At that time, it will also be possible to speed up the startup time for ParGAP. Much of the startup time is caused by waiting for GAP to read its library files. It will be possible to use the GAP function, SaveWorkspace() to save a version with the GAP library pre-loaded. That saved version can then be used to start up ParGAP. This is not currently possible, because ParGAP needs to get at the command line of GAP before the GAP kernel sees it.

Comments and contributions to a ParGAP user library, or any other type of assistance, are gratefully accepted.

Gene Cooperman gene@ccs.neu.edu

[Up] [Next] [Index]

ParGAP manual
November 2013