24.5.17

Enable user ssh to compute nodes to use GNU parallel on a Rocks cluster running Slurm

Problem: When using Slurm as your scheduler for a Rocks cluster users cannot ssh to compute nodes unless they have a job running there.  This means you cannot use GNU Parallel's --sshloginfile option to parallelize across the cluster.  There are some crazy workarounds out there to use GNU Parallel under Slurm, like sending each Parallel command to a separate srun command, but that is a hassle when the functionality is already built in to Parallel.
Solution: Slurm uses a PAM (Pluggable Authentication Module) to control user access to some features.  To permit all users to freely ssh to any compute node, turn off the Slurm PAM:
1. Verify the setting for the Rocks attribute 'slurm_pam_enable':
rocks list attr
It should be true.
2. Make it false, and verify:
rocks set attr slurm_pam_enable false
rocks list attr
3. Send the setting to all the compute nodes:
rocks sync slurm
4. Verify:
rocks list host attr | grep slurm
You should see something like:

compute-0-0:     slurm_pam_enable            false          G    
compute-0-0:     slurm_pam_enable_old     true           G    
compute-0-1:     slurm_pam_enable            false          G    
compute-0-1:     slurm_pam_enable_old     true           G    

The old attribute has been renamed slurm_pam_enable_old and retains the old setting, true.
5. If rocks sync slurm didn't take, kickstart the nodes and verify the change as above:
rocks run host '/boot/kickstart/cluster-kickstart'
6. Log out and log in under your user name, not root, and test ssh, e.g.:
ssh compute-0-0
You should arrive at the login prompt for the node.
7. Try out GNU Parallel with --sshloginfile now.  Go Ole!

(You probably shouldn't do this if you have a bunch of users.)