Solution: Slurm uses a PAM (Pluggable Authentication Module) to control user access to some features. To permit all users to freely ssh to any compute node, turn off the Slurm PAM:
1. Verify the setting for the Rocks attribute 'slurm_pam_enable':
rocks list attr
It should be true.
2. Make it false, and verify:
rocks set attr slurm_pam_enable false
rocks list attr
3. Send the setting to all the compute nodes:
rocks sync slurm
4. Verify:
rocks list host attr | grep slurm
You should see something like:
compute-0-0: slurm_pam_enable false G
compute-0-0: slurm_pam_enable_old true G
compute-0-1: slurm_pam_enable false G
compute-0-1: slurm_pam_enable_old true G
The old attribute has been renamed slurm_pam_enable_old and retains the old setting, true.
5. If rocks sync slurm didn't take, kickstart the nodes and verify the change as above:
rocks run host '/boot/kickstart/cluster-kickstart'
6. Log out and log in under your user name, not root, and test ssh, e.g.:
ssh compute-0-0
You should arrive at the login prompt for the node.
7. Try out GNU Parallel with --sshloginfile now. Go Ole!
(You probably shouldn't do this if you have a bunch of users.)
5. If rocks sync slurm didn't take, kickstart the nodes and verify the change as above:
rocks run host '/boot/kickstart/cluster-kickstart'
6. Log out and log in under your user name, not root, and test ssh, e.g.:
ssh compute-0-0
You should arrive at the login prompt for the node.
7. Try out GNU Parallel with --sshloginfile now. Go Ole!
(You probably shouldn't do this if you have a bunch of users.)