20.8.15

GNU Parallel and Rocks clusters I. Distributing BASH scripts across nodes

1. Install GNU Parallel as user:
(wget -O - pi.dk/3 || curl pi.dk/3/) | bash

Path to executable should now be:  ~/bin/parallel
Modify PATH if necessary so that the GNU Parallel you just installed is used preferentially to any other GNU parallel on the system. I did this by putting my ~/bin directory ahead of the other entries in PATH in my ~/.bash_profile file.  Like so:

# User specific environment and startup programs
PATH=$HOME/bin:$PATH
export PATH


2. Test:
rocks run host command="~/bin/parallel ::: hostname"

The output should look something like:
compute-0-0: down
compute-0-1.local
compute-0-2.local
compute-0-5.local
compute-0-4.local

Here each compute node should receive a request (via "rocks run host") to print its name to standard output ("hostname") which is executed via GNU Parallel.  My cluster happens to be missing compute-0-3, and has a permanently dead node called compute-0-0, which explains the weirdness in the output above.


3. A more complex test.  Here we want to parallelize a simple subroutine across all nodes in the cluster.  We need a file that names all nodes we want to send jobs to.  Let's call it ~/machines. It should look something like:

cat ~/machines
compute-0-1
compute-0-2
compute-0-4
compute-0-5

In the BASH script below, the simple subroutine ("subr()") creates a couple variables ("a" and "b") from the arguments sent to it by GNU Parallel.  It then echoes those variables plus the hostname of the node running it.

subr() needs to be exported to the shell so that GNU parallel can access it ("export -f subr") from the various independent shells it creates.

The "parallel" command is assembled as follows:
parallel #call to the executable, you may need to use the full path ~/bin/parallel
--env subr #pass the exported subroutine to the new shell
--sshloginfile machines #path to the machines file with list of node, note this assumes passwordless ssh, the norm on Rocks clusters.
--jobs 24 #the number of cores available on the compute node
subr #a call to the subroutine
::: $a ::: $b #GNU Parallel syntax to manage the variable lists


4. Paste the following into the terminal:

subr() {
a=$1;
b=$2;
echo -n $a $b" ";
hostname;
}
export -f subr; #necessary for gnu parallel to work

a="1 2 3";
b="x y z";
parallel --env subr --sshloginfile machines --jobs 24 subr ::: $a ::: $b;

The output should look something like:
1 x compute-0-1.local
1 y compute-0-5.local
1 z compute-0-4.local
2 y compute-0-1.local
2 x compute-0-2.local
2 z compute-0-5.local
3 z compute-0-1.local
3 x compute-0-4.local
3 y compute-0-2.local

So each pairwise combination of variables has been "echoed" precisely one time, using the nodes specified in the "machines" file, as controlled by GNU Parallel. Go Ole!