30.9.14

OpenMPI jobs hang on Rocks 5.5 cluster

Problem:
There are many, many reasons why an OpenMPI job might fail, or hang during execution, on Rocks clusters.  An oddball one relates to the existence of a virtual network connection, "virbr0" with IP address 192.168.122.1.  It is unclear why this connection exists, but mpirun may use it to try to pass a message between machines, rather than using your real IPs.  The good news is, you can remove it.

Solution:
1. Verify that "virbr0" exists:  /sbin/ifconfig

2. If so, make sure that you don't have other virtual networks that you actually need. There is a pretty good chance the next step will mess them up.

3. Run the following commands on each node:

virsh net-destroy default
virsh net-undefine default
/sbin/service libvirt-bin restart
/sbin/ifconfig

The last command is just to verify that "virbr0" no longer exists.  "virbr0" should not recreate itself when you  reboot the cluster.


17.9.14

Repair corrupt AppleScript .scpt files stored on non-Mac media

Problem: When opened in Script Editor or AppleScript Editor, file displays ASCII garbage instead of human-readable text.  The garbage might begin with "FasdUAS".  This occasionally occurs when the AppleScript .scpt file is saved on non-Mac media, such as a central server or a NAS device.  I don't understand why this occurs, but it has happened to me enough times (on a couple Windows servers and on a NAS) that it seems like a real problem.  It might have something to do with incompatibility between the antiquated resource/data fork file structure of AppleScript files and hard drives that are not formatted for Mac OS.  I really don't know.

One solution: Always save AppleScripts on Mac OS formatted media.  Fat lot of help that is when the script you've been working on for days is suddenly trashed.

Another solution:  Make a copy of your corrupt file.  Open up the copied file with a hex editor.  There are many; I use Hex Fiend.  AppleScript .scpt files begin with the ASCII signature "FasdUAS".  Look for this in the ASCII panel of the hex editor.  The corresponding hex signature is "46 61 73 64 55 41 53".  Delete any text or hex before these signatures.  Likewise, .scpt files terminate with the hex signature "FADEDEAD".  Look for that termination signature in the hex panel of the editor.  Often times  a bunch of "00 00 00 00 00 00" will have been added after "FADEDEAD" for some inexplicable reason.  Delete those or anything else that comes after.  Save the file.  Open in Script Editor.  It may work.  This approach can also be used to recover AppleScripts from byte-level copies of damaged hard disks made using dd or ddrescue.

16.9.14

Transpose large data matrix using BASH

Problem: Data matrix of genomewide SNP data in wrong orientation.  1005 individuals x 214051 SNPs = 2.1E8 string elements.  Transposing very large data matrices may overwhelm system memory if the entire matrix is loaded at once.  This leads to disk caching, further slowing an already time consuming task.

One solution: A brief BASH script is used to cut consecutive columns from the data matrix.  tr converts end of line characters to commas, converting the column of text into a row.  Rows are then consecutively appended to output file.  Memory usage negligible, 5.5 hrs.

Steps:
1. Use the following BASH script (assumes comma delimited csv file with 1005 columns):

  #!/bin/bash

  InputFile="head.txt"
  OutputFile="outfile.txt"
  NumColumns=1005

  > $OutputFile

  for (( i=1; i<=$NumColumns; i++ ))
   do
    echo $i"/"$NumColumns
    cut -d',' -f$i $InputFile | tr '\n' ','  >> $OutputFile
    echo >> $OutputFile
   done

2. Modify InputFile, OutputFile, and NumColumns variable as needed.

A faster but more memory intensive solution, from the boards:
awk '
{
for (i=1; i<=NF; i++)  {
    a[NR,i] = $i
    }
}
NF>p { p = NF }
END {   
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' $InputFile > $OutputFile;

Compile hybrid MPI/OpenMP C++ program on MacOS 10.9 Mavericks

Problem:  MacOS 10.9 exclusively uses clang for compiling C or C++ source code.  Even if you type "g++" or "gcc" you get clang.  Clang supports Open MPI but does not support OpenMP.

One solution:  Install GCC and Open MPI, redirect "mpic++" wrapper compiler from Open MPI to use g++ instead of clang.

Steps:
1.  brew install open-mpi

MPI code should now compile fine using mpic++, but OpenMP library will not be found by clang, raising the error: "fatal error: 'omp.h' file not found".


2.  brew install homebrew/versions/gcc49

Install gcc.  After you do this, typing "g++ --version" will still give you information about clang, not g++.  The symlink remains.  This means that, if you type "mpic++" to compile, the wrapper compiler will still activate clang, and the "'omp.h' file not found" error will still occur.


3.  open /usr/local/Cellar/open-mpi/1.7.4/share/openmpi/mpic++-wrapper-data.txt

Modify file "mpic++-wrapper-data.txt" to redirect MPI wrapper compiler to use g++ instead of clang++ (exact location above depends on the version you get from brew).
The default setting is "compiler=clang++".  Change this line to "compiler=g++-4.9".  It is somewhat handy that when you install gcc using brew, the file name has the version appended, "-4.9" for example.  This way you don't have to break Mavericks symlinks between gcc, g++ and clang.


4.  Open new terminal window.  Compiling with "mpic++" should now work.