OpenMPI and InfiniBand

From Computational Biophysics and Materials Science Group
Jump to: navigation, search

We have OpenMPI installed on Combo, you can trigger MPI enabled program theoretically by

mpirun -np NUM_OF_PROCESSES PROGRAM

but there are few more things.

InfiniBand

By default, the openMPI installed under /share/apps does not use InfiniBand to communicate between nodes. In order to do so, we need to pass mpirun with some Modular Component Architecture (MCA) parameters [1]. So in order to use InfiniBand, we set MCA parameter BTL (point-to-point byte movement)[2] to self,openib indicating use the "openib" and "self" BTLs.

mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM

Reporting problem

Some nodes not working in PBS (Solved)

During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error:

[compute-0-14:10577] *** An error occurred in MPI_Allreduce
[compute-0-14:10577] *** reported by process [47112135966721,47111496269824]
[compute-0-14:10577] *** on communicator MPI_COMM_WORLD
[compute-0-14:10577] *** MPI_ERR_IN_STATUS: error code in status
[compute-0-14:10577] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[compute-0-14:10577] ***    and potentially your MPI job)

However, the exactly the same command could be successfully executed in command lines.

$ mpirun -mca btl self,openib -np 2 -npernode 1 --hostfile nodefile /home/kevin/opt/gromacs-5.1.2-MPI-single/bin/gmx_mpi mdrun -v -ntomp 16 -pin on -s W50k.tpr -deffnm output/g-testcomp14-np2
$ cat nodefile
compute-0-14
compute-0-5

Solution: No reason, suddenly get better.

References

  1. Jump up https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php
  2. Jump up https://docs.oracle.com/cd/E19708-01/821-1319-10/mca-params.html