22.11.16

Add shared SSD drive to Rocks cluster

Problem: You might want to add an SSD drive to speed up procedures that have a lot of disk I/O.  What follows is a description of just one way to do this.  If you are clever, you can probably manage without any restarts.  I am not so clever.  The description assumes that the cluster is named "MyCluster" and the SSD drive is named "SSDscratch". You will need to be root.
Solution:
1. Plug in the drive to the head node. Reboot.
2. List disks and their partitions:
/sbin/fdisk -l
3. Find your new SSD drive in the output. In my case it was called "/dev/sdb". It probably needs to be partitioned and formatted so fdisk may say something like "Disk /dev/sdb doesn't contain a valid partition table."
4. Partition the disk
/sbin/fdisk /dev/sdb
     Follow menu items:
     >n (create a new partition)
        >p (primary partition)
           >1 (partition number)
              >accept defaults to make the entire disk a single partition
     >w (write the new partition)
5. Verify that the partition table for the SSD is how you want it:
/sbin/fdisk -l
6. Find out what disk format other volumes on the head node are using (ext3, ext4, etc).
df -T
7. Format the SSD similarly.  I will use ext3.
/sbin/mkfs.ext3 /dev/sdb1
8. Create a mount point, make it writable by everyone, mount the disk, verify it mounted. It's a good idea to keep it in the /export/home directory since that is a place that Rocks likes to share.
mkdir /export/home/SSDscratch
chmod a+w /export/home/SSDscratch
mount /dev/sdb1 /export/home/SSDscratch
df -T
9. Modify /etc/exports so NFS shares the new mount:
cp /etc/exports /etc/exportsORIG #backup the original file
vi /etc/exports
     Using vi (or whatever you like), add line like:"/export/home/SSDscratch 10.1.1.1(rw,async,no_root_squash) 10.1.0.0/255.255.0.0(rw,async)"
10. Modify /etc/auto.share:
cp /etc/auto.share /etc/auto.shareORIG
vi /etc/auto.share
     Add line like:"SSDscratch MyCluster.local:/export/home/SSDscratch"
11. Modify /etc/auto.home:
cp /etc/auto.home /etc/auto.homeORIG
vi /etc/auto.home
     Add line like:"SSDscratch    -nfsvers=3      MyCluster.local:/export/home/SSDscratch"
12. Modify /etc/fstab so it automatically mounts:
cp /etc/fstab /etc/fstabORIG
vi /etc/fstab
     Add line like:"/dev/sdb1     /export/home/SSDscratch     ext3     defaults     0 0"
13. Restart NFS and sync cluster. This sends the modified files to all nodes:
/sbin/service nfs restart
rocks sync users
14. To load the new settings on the nodes, I had to reboot them:
rocks run host 'reboot'
15. To verify that the SSD is mounted automatically, reboot the head node. (If it won't reboot, the problem is likely in the /etc/fstab file. Revert /etc/fstab to /etc/fstabORIG by booting from a live disk and try again.):
reboot
16. Verify that the SSD directory is mounted on all nodes. It should mount at /home/SSDscratch:
rocks run host compute 'hostname; ls -l /home/SSDscratch'
     If not, rocks sync users again and reboot. You may have to reboot twice for unknown reasons.

Congratulate yourself. That was a lot of work.

12.8.16

MPI_Bcast large dynamic char array

SysAdmins hate it when all your MPI procs access their precious disks.  The preferred procedure is to read from the disk once, using a single proc, then pass the data to all of the other procs.  In principle MPI_Bcast makes this easy.  In practice...you decide.

I needed to do this for arbitrary file sizes of a gigabyte or more containing string data.  The process is conceptually simple:

1. Use the root process (proc 0) to get the file size.
2. MPI_Bcast this file size to all procs.
3. Initialize and size a char array to contain the file data on all procs.
4. Use proc 0 to read the data file.
5. MPI_Bcast the file data to all procs.

Here is a prototype.
Save:  testmpi.cpp.
Compile:  mpic++ -o t testmpi.cpp.
Run:  mpirun -np 8 t yourbigfile.txt
-------
#include <algorithm>
#include <fstream>
#include <mpi.h>
#include <stdio.h>
#include <string.h>

using namespace std;

//determine the size of a file
std::ifstream::pos_type filesize(const char* filename)
{
std::ifstream in(filename, std::ifstream::ate | std::ifstream::binary);
return in.tellg(); 
}

//quickly reads a large .dat file into a memory buffer
char * MyBigRead(char* DatFilePath)
{
FILE * pFile;
unsigned long long lSize;
char * buffer;
size_t result;

pFile = fopen ( DatFilePath , "r" );
if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);

// allocate memory to contain the whole file:
buffer = (char*) malloc (sizeof(char)*lSize);
if (buffer == NULL) {fputs ("Memory error",stderr); exit (2);}

// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);
if (result != lSize) {fputs ("Reading error",stderr); exit (3);}

fclose (pFile);
return buffer;
}

int main(int argc, char* argv[]) {

MPI::Init ();
int procid = MPI::COMM_WORLD.Get_rank ( );  //Get the individual process ID.
int nprocs = MPI::COMM_WORLD.Get_size ( );

unsigned long long f = 0; //dat file size

//get the size of the dat file using proc 0
if (procid == 0)
{
f = (unsigned long long)filesize(argv[1]); 
f = f + 1; //increase by 1 to accomodate \0 string terminator
}

//broadcast file size to all procs
MPI_Bcast(&f, 1, MPI_UNSIGNED_LONG_LONG, 0, MPI_COMM_WORLD);

//initialize and size the data structure to hold contents of dat file
char * DatFileBuffer = (char*)malloc(f);

//report before MPI_Bcast'ing the data
printf("[proc%d]Before: snippet of DatFileBuffer:>%.5s<, size:%d\n", procid, DatFileBuffer, f);

//read the dat file from disk using proc 0
if (procid == 0)
{
char * d = MyBigRead(argv[1]);

//convert the char* to a char array
strcpy(DatFileBuffer, d); //copy data read into char * d to the pre-sized DatFileBuffer
}

//broadcast the dat file contents to all procs
MPI_Bcast(&DatFileBuffer[0], f, MPI_CHAR, 0, MPI_COMM_WORLD);

//report after MPI_Bcast'ing the data
printf("[proc%d]After: snippet of DatFileBuffer:>%.10s<, size:%d\n", procid, DatFileBuffer, f);

MPI_Finalize();
return 0;
}
-------

16.6.16

Convert PLINK to fastPHASE with BASH and GNU Parallel

Problem: The converter built in to PLINK converts nucleotide encoded data to a 0/1 binary format automatically when using --recode-fastphase to create a fastPHASE input file. This causes irretrievable loss of information.

Solution:  I can't find any way around this other than to make a new converter.
1. Using PLINK, recode .ped file such that each chromosome has its own .ped/.map set of files.  Something like:
./plink --file mydata --chr 1 --recode --out malc1;
./plink --file mydata --chr 2 --recode --out malc2;

2. Use the following script to create fastPHASE input files. Enter your values for 'nchr', 'iroot', and 'oroot' after the function dbl().
dbl() {
i=$1;
f="$infile"".ped";
nrow=$(wc -l "$f" | awk '{print $1}');
echo "splitting row $i/$nrow";

# split lines containing genotypes, by allele, for each individual
#odd alleles
odd=$(sed -n $i"p" "$f" | tr ' ' '\n' | tail -n +7 | sed -n 'p;n' | tr '\n' ' ' | sed 's/ $//g');
#even alleles
even=$(sed -n $i"p" "$f" | tr ' ' '\n' | tail -n +7 | sed -n 'g;n;p' | tr '\n' ' ' | sed 's/ $//g');
echo "$odd" > "$i.tmp";
echo "$even" >> "$i.tmp";
}
export -f dbl;

nchr=17; #number of chromosomes
iroot="malc"; #root string of input files for each chromosome
oroot="chr"; #root string for output files, in fastPHASE format
for ((i=1;i<=$nchr;i++));
  do echo "chr"$i;
    export infile="$iroot""$i";
    h1=$(wc -l "$infile".ped | awk '{print $1}'); #n samples for fastPHASE header line 1
    h2a=$(head -1 "$infile".ped | awk '{print NF}');
    h2=$((($h2a-6)/2)); #n loci for fastPHASE header line 2
    h3a=$(awk '{print $4}' "$infile".map);
    h3=$(echo "P $h3a" | tr "\n" " "); #locus positions for fastPHASE header line 3
    s=$(awk '{print $2}' "$infile".ped);
    
    #split single line per sample plink format into two line per sample fastPHASE format
    seq 1 $h1 | parallel --env dbl --env infile dbl;

    #Reassemble into a single file:
    outfile="$oroot""$i".inp;
    echo "$h1" > "$outfile";
    echo "$h2" >> "$outfile";
    echo "$h3" >> "$outfile";
    for ((j=1;j<=$h1;j++));
      do sample=$(echo "$s" | sed -n $j"p")" ";
        echo "writing chr $i sample $sample $j/$h1";
        echo '>'"$sample" >> "$outfile";
        head -1 "$j.tmp" | sed 's/0/?/g' >> "$outfile";
        tail -n +2 "$j.tmp" | sed 's/0/?/g' >> "$outfile";
        rm "$j.tmp";
       done;
  done;

3. Phase away.

20.5.16

Transpose large data matrix using BASH. II. GNU Parallel.

In a prior post, I presented a low memory BASH solution for transposing large data matrices.  Here is a way to speed that basic procedure using parallel processing on an HPC.

1. Generate a large data table for testing (~2GB, ~1E9 elements):
ncol=2472;
nrow=404627;
seq -s' ' 1 $ncol > m.txt;
foo=$(for ((i=1; i<=$ncol; i++));
do
   echo $[ 1 + $[ RANDOM % 4 ]]; 
done;);
foo=$(echo $foo | tr "\n" " ");
export nrow;
export foo;
perl -e 'for($i=0;$i<$ENV{nrow};$i++){print "$ENV{foo}\n"}' >> m.txt;

Notes: In the 3rd line, a header is created such that columns will be labeled consecutively.  These become important later.  Watch this step, some Linux versions add a linebreak, others do not.  You want the linebreak.

2. Run on HPC using GNU Parallel:
InputFile="m.txt";
seq 1 $ncol | parallel --sshloginfile ~/machines --jobs 24 "cut -d' ' -f{} $InputFile | tr '\n' ' ' | sed 's/ $/\n/g' > ~/{}.txt; echo Col {};";

Notes: The method above works as follows. First, seq delivers a set of numbers (from 1 to the total number of columns in the input matrix) to GNU Parallel.  GNU Parallel then distributes $ncol jobs among nodes specified in the file ~/machines.  The option --jobs 24 specifies that each node has 24 cores.  This approach cuts a single column from the input file, transposes it, then writes it to disk.  I had no luck with the GNU Parallel option --keep-order, which would presumably allow one to avoid this intermediate write step.

3. Fuse the output files together:
> mrot.txt;
for ((i=1; i<=$ncol; i++));
do
   cat "$i.txt" >> mrot.txt;
   rm "$i.txt";
done;

1.3.16

Fun with triangular matrices and bash


val="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15";

j=1; k=0;
v=$(echo $val | wc -w); #number of elements in $val
r=$(printf "%.0f" $(echo "sqrt(2*$v+(1/4))-(1/2)" | bc -l)); #number of rows needed in matrix to accomodate $v elements

#Lower triangle, by row
for ((i=1;i<=$r;i++));
  do m=$(($j+$k));
    j=$(($j+$k)); k=$(($k+1));
    n=$(printf "%.0f" $(echo "($k+1)*($k/2)" | bc -l));
    echo $val | cut -d' ' -f$m-$n;
done;

1
2 3
4 5 6
7 8 9 10
11 12 13 14 15

#Upper triangle, by row, tab-delimited
nblanks=0;
nvals=$(($r - 1));
m=1;
mat=$(
for ((i=1;i<=$r+1;i++));
  do 
    #write blanks
    for ((j=1;j<=$nblanks;j++)); do echo -n $'X\t'; done;

    #write zero on diagonal
    echo -n $'0\t';
    
    #write values from input
    n=$(($m + $nvals));
    echo -n $val | cut -d' ' -f$m-$n | tr " " "\t";
    
    #update
    nblanks=$(($nblanks + 1));
    nvals=$(($nvals - 1));
    m=$(($n + 1));
  done;
);
echo "$mat";  

0     1     2     3     4     5
X     0     6     7     8     9
X     X     0     10    11    12
X     X     X     0     13    14
X     X     X     X     0     15
X     X     X     X     X     0

#Lower triangle, by column, comma-delimited
  #rotate the matrix
  p=$(echo "$mat" | head -1 | awk '{print NF}');
  for (( i=1; i<=$p; i++ ));
    do
     foo=$(echo "$mat" | cut -d$'\t' -f$i);
     echo $foo | tr " " "," | sed 's/X//g';
    done  

0,,,,,
1,0,,,,
2,6,0,,,
3,7,10,0,,
4,8,11,13,0,
5,9,12,14,15,0