quuxfu: chop-chop: 2016

22.11.16

Add shared SSD drive to Rocks cluster

Problem: You might want to add an SSD drive to speed up procedures that have a lot of disk I/O. What follows is a description of just one way to do this. If you are clever, you can probably manage without any restarts. I am not so clever. The description assumes that the cluster is named "MyCluster" and the SSD drive is named "SSDscratch". You will need to be root.
Solution:
1. Plug in the drive to the head node. Reboot.
2. List disks and their partitions:
/sbin/fdisk -l
3. Find your new SSD drive in the output. In my case it was called "/dev/sdb". It probably needs to be partitioned and formatted so fdisk may say something like "Disk /dev/sdb doesn't contain a valid partition table."
4. Partition the disk
/sbin/fdisk /dev/sdb
Follow menu items:
>n (create a new partition)
>p (primary partition)
>1 (partition number)
>accept defaults to make the entire disk a single partition
>w (write the new partition)
5. Verify that the partition table for the SSD is how you want it:
/sbin/fdisk -l
6. Find out what disk format other volumes on the head node are using (ext3, ext4, etc).
df -T
7. Format the SSD similarly. I will use ext3.
/sbin/mkfs.ext3 /dev/sdb1
8. Create a mount point, make it writable by everyone, mount the disk, verify it mounted. It's a good idea to keep it in the /export/home directory since that is a place that Rocks likes to share.
mkdir /export/home/SSDscratch
chmod a+w /export/home/SSDscratch
mount /dev/sdb1 /export/home/SSDscratch
df -T

9. Modify /etc/exports so NFS shares the new mount:

cp /etc/exports /etc/exportsORIG #backup the original file

vi /etc/exports

Using vi (or whatever you like), add line like:"/export/home/SSDscratch 10.1.1.1(rw,async,no_root_squash) 10.1.0.0/255.255.0.0(rw,async)"

10. Modify /etc/auto.share:

cp /etc/auto.share /etc/auto.shareORIG

vi /etc/auto.share

Add line like:"SSDscratch MyCluster.local:/export/home/SSDscratch"

11. Modify /etc/auto.home:

cp /etc/auto.home /etc/auto.homeORIG

vi /etc/auto.home

Add line like:"SSDscratch -nfsvers=3 MyCluster.local:/export/home/SSDscratch"

12. Modify /etc/fstab so it automatically mounts:

cp /etc/fstab /etc/fstabORIG

vi /etc/fstab

Add line like:"/dev/sdb1 /export/home/SSDscratch ext3 defaults 0 0"

13. Restart NFS and sync cluster. This sends the modified files to all nodes:

/sbin/service nfs restart

rocks sync users

14. To load the new settings on the nodes, I had to reboot them:

rocks run host 'reboot'

15. To verify that the SSD is mounted automatically, reboot the head node. (If it won't reboot, the problem is likely in the /etc/fstab file. Revert /etc/fstab to /etc/fstabORIG by booting from a live disk and try again.):
reboot
16. Verify that the SSD directory is mounted on all nodes. It should mount at /home/SSDscratch:

rocks run host compute 'hostname; ls -l /home/SSDscratch'

If not, rocks sync users again and reboot. You may have to reboot twice for unknown reasons.

Congratulate yourself. That was a lot of work.

12.8.16

MPI_Bcast large dynamic char array

SysAdmins hate it when all your MPI procs access their precious disks. The preferred procedure is to read from the disk once, using a single proc, then pass the data to all of the other procs. In principle MPI_Bcast makes this easy. In practice...you decide.

I needed to do this for arbitrary file sizes of a gigabyte or more containing string data. The process is conceptually simple:

1. Use the root process (proc 0) to get the file size.
2. MPI_Bcast this file size to all procs.
3. Initialize and size a char array to contain the file data on all procs.
4. Use proc 0 to read the data file.
5. MPI_Bcast the file data to all procs.

Here is a prototype.
Save: testmpi.cpp.
Compile: mpic++ -o t testmpi.cpp.
Run: mpirun -np 8 t yourbigfile.txt
-------
#include <algorithm>
#include <fstream>
#include <mpi.h>
#include <stdio.h>
#include <string.h>

using namespace std;

//determine the size of a file
std::ifstream::pos_type filesize(const char* filename)
{
std::ifstream in(filename, std::ifstream::ate | std::ifstream::binary);
return in.tellg();
}

//quickly reads a large .dat file into a memory buffer
char * MyBigRead(char* DatFilePath)
{
FILE * pFile;
unsigned long long lSize;
char * buffer;
size_t result;

pFile = fopen ( DatFilePath , "r" );
if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);

// allocate memory to contain the whole file:
buffer = (char*) malloc (sizeof(char)*lSize);
if (buffer == NULL) {fputs ("Memory error",stderr); exit (2);}

// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);
if (result != lSize) {fputs ("Reading error",stderr); exit (3);}

fclose (pFile);
return buffer;
}

int main(int argc, char* argv[]) {

MPI::Init ();
int procid = MPI::COMM_WORLD.Get_rank ( ); //Get the individual process ID.
int nprocs = MPI::COMM_WORLD.Get_size ( );

unsigned long long f = 0; //dat file size

//get the size of the dat file using proc 0
if (procid == 0)
{
f = (unsigned long long)filesize(argv[1]);
f = f + 1; //increase by 1 to accomodate \0 string terminator
}

//broadcast file size to all procs
MPI_Bcast(&f, 1, MPI_UNSIGNED_LONG_LONG, 0, MPI_COMM_WORLD);

//initialize and size the data structure to hold contents of dat file
char * DatFileBuffer = (char*)malloc(f);

//report before MPI_Bcast'ing the data
printf("[proc%d]Before: snippet of DatFileBuffer:>%.5s<, size:%d\n", procid, DatFileBuffer, f);

//read the dat file from disk using proc 0
if (procid == 0)
{
char * d = MyBigRead(argv[1]);

//convert the char* to a char array
strcpy(DatFileBuffer, d); //copy data read into char * d to the pre-sized DatFileBuffer
}

//broadcast the dat file contents to all procs
MPI_Bcast(&DatFileBuffer[0], f, MPI_CHAR, 0, MPI_COMM_WORLD);

//report after MPI_Bcast'ing the data
printf("[proc%d]After: snippet of DatFileBuffer:>%.10s<, size:%d\n", procid, DatFileBuffer, f);

MPI_Finalize();
return 0;
}
-------

16.6.16

Convert PLINK to fastPHASE with BASH and GNU Parallel

Problem: The converter built in to PLINK converts nucleotide encoded data to a 0/1 binary format automatically when using --recode-fastphase to create a fastPHASE input file. This causes irretrievable loss of information.

Solution: I can't find any way around this other than to make a new converter.
1. Using PLINK, recode .ped file such that each chromosome has its own .ped/.map set of files. Something like:
./plink --file mydata --chr 1 --recode --out malc1;
./plink --file mydata --chr 2 --recode --out malc2;

2. Use the following script to create fastPHASE input files. Enter your values for 'nchr', 'iroot', and 'oroot' after the function dbl().

dbl() {

i=$1;

f="$infile"".ped";

nrow=$(wc -l "$f" | awk '{print $1}');

echo "splitting row $i/$nrow";

# split lines containing genotypes, by allele, for each individual

#odd alleles

#even alleles

echo "$odd" > "$i.tmp";

echo "$even" >> "$i.tmp";

}

export -f dbl;

nchr=17; #number of chromosomes

iroot="malc"; #root string of input files for each chromosome

oroot="chr"; #root string for output files, in fastPHASE format

for ((i=1;i<=$nchr;i++));

do echo "chr"$i;

export infile="$iroot""$i";

h1=$(wc -l "$infile".ped | awk '{print $1}'); #n samples for fastPHASE header line 1

h2a=$(head -1 "$infile".ped | awk '{print NF}');

h2=$((($h2a-6)/2)); #n loci for fastPHASE header line 2

h3a=$(awk '{print $4}' "$infile".map);

h3=$(echo "P $h3a" | tr "\n" " "); #locus positions for fastPHASE header line 3

s=$(awk '{print $2}' "$infile".ped);

#split single line per sample plink format into two line per sample fastPHASE format

seq 1 $h1 | parallel --env dbl --env infile dbl;

#Reassemble into a single file:

outfile="$oroot""$i".inp;

echo "$h1" > "$outfile";

echo "$h2" >> "$outfile";

echo "$h3" >> "$outfile";

for ((j=1;j<=$h1;j++));

do sample=$(echo "$s" | sed -n $j"p")" ";

echo "writing chr $i sample $sample $j/$h1";

echo '>'"$sample" >> "$outfile";

head -1 "$j.tmp" | sed 's/0/?/g' >> "$outfile";

tail -n +2 "$j.tmp" | sed 's/0/?/g' >> "$outfile";

rm "$j.tmp";

done;

3. Phase away.

20.5.16

Transpose large data matrix using BASH. II. GNU Parallel.

In a prior post, I presented a low memory BASH solution for transposing large data matrices. Here is a way to speed that basic procedure using parallel processing on an HPC.

1. Generate a large data table for testing (~2GB, ~1E9 elements):
ncol=2472;
nrow=404627;
seq -s' ' 1 $ncol > m.txt;
foo=$(for ((i=1; i<=$ncol; i++));
do
echo $[ 1 + $[ RANDOM % 4 ]];
done;);
foo=$(echo $foo | tr "\n" " ");
export nrow;
export foo;
perl -e 'for($i=0;$i<$ENV{nrow};$i++){print "$ENV{foo}\n"}' >> m.txt;

Notes: In the 3rd line, a header is created such that columns will be labeled consecutively. These become important later. Watch this step, some Linux versions add a linebreak, others do not. You want the linebreak.

2. Run on HPC using GNU Parallel:
InputFile="m.txt";
seq 1 $ncol | parallel --sshloginfile ~/machines --jobs 24 "cut -d' ' -f{} $InputFile | tr '\n' ' ' | sed 's/ $/\n/g' > ~/{}.txt; echo Col {};";

Notes: The method above works as follows. First, seq delivers a set of numbers (from 1 to the total number of columns in the input matrix) to GNU Parallel. GNU Parallel then distributes $ncol jobs among nodes specified in the file ~/machines. The option --jobs 24 specifies that each node has 24 cores. This approach cuts a single column from the input file, transposes it, then writes it to disk. I had no luck with the GNU Parallel option --keep-order, which would presumably allow one to avoid this intermediate write step.

3. Fuse the output files together:
> mrot.txt;
for ((i=1; i<=$ncol; i++));
do
cat "$i.txt" >> mrot.txt;
rm "$i.txt";
done;

1.3.16

Fun with triangular matrices and bash

val="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15";

j=1; k=0;
v=$(echo $val | wc -w); #number of elements in $val
r=$(printf "%.0f" $(echo "sqrt(2*$v+(1/4))-(1/2)" | bc -l)); #number of rows needed in matrix to accomodate $v elements

#Lower triangle, by row
for ((i=1;i<=$r;i++));
do m=$(($j+$k));
j=$(($j+$k)); k=$(($k+1));
n=$(printf "%.0f" $(echo "($k+1)*($k/2)" | bc -l));
echo $val | cut -d' ' -f$m-$n;
done;

1

2 3

4 5 6

7 8 9 10

11 12 13 14 15

#Upper triangle, by row, tab-delimited
nblanks=0;
nvals=$(($r - 1));
m=1;
mat=$(
for ((i=1;i<=$r+1;i++));
do
#write blanks
for ((j=1;j<=$nblanks;j++)); do echo -n $'X\t'; done;

#write zero on diagonal
echo -n $'0\t';

#write values from input
n=$(($m + $nvals));
echo -n $val | cut -d' ' -f$m-$n | tr " " "\t";

#update
nblanks=$(($nblanks + 1));
nvals=$(($nvals - 1));
m=$(($n + 1));
done;
);
echo "$mat";

0 1 2 3   4 5

X 0 6 7 8 9

X X 0 10 11 12

X X X 0 13 14

X X X X 0 15

X X X X X 0

#Lower triangle, by column, comma-delimited
#rotate the matrix
p=$(echo "$mat" | head -1 | awk '{print NF}');
for (( i=1; i<=$p; i++ ));
do
foo=$(echo "$mat" | cut -d$'\t' -f$i);
echo $foo | tr " " "," | sed 's/X//g';
done

0,,,,,

1,0,,,,

2,6,0,,,

3,7,10,0,,

4,8,11,13,0,

5,9,12,14,15,0