Difference between revisions of "OpenMPI and InfiniBand"

From Computational Biophysics and Materials Science Group
Jump to: navigation, search
Line 10: Line 10:
  
 
  mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM
 
  mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM
 +
 +
== Reporting problem ==
 +
 +
=== Inside and outside PBS ===
 +
 +
During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error:
 +
[compute-0-14:10577] *** An error occurred in MPI_Allreduce
 +
[compute-0-14:10577] *** reported by process [47112135966721,47111496269824]
 +
[compute-0-14:10577] *** on communicator MPI_COMM_WORLD
 +
[compute-0-14:10577] *** MPI_ERR_IN_STATUS: error code in status
 +
[compute-0-14:10577] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 +
[compute-0-14:10577] ***    and potentially your MPI job)
 +
 +
However, the exactly the same command could be successfully executed in command lines.
 +
$ mpirun -mca btl self,openib -np 2 -npernode 1 --hostfile nodefile /home/kevin/opt/gromacs-5.1.2-MPI-single/bin/gmx_mpi mdrun -v -ntomp 16 -pin on -s W50k.tpr -deffnm output/g-testcomp14-np2
 +
$ cat nodefile
 +
compute-0-14
 +
compute-0-5
  
 
==References==
 
==References==
 
{{Reflist}}
 
{{Reflist}}

Revision as of 19:17, 17 May 2016

We have OpenMPI installed on Combo, you can trigger MPI enabled program theoretically by

mpirun -np NUM_OF_PROCESSES PROGRAM

but there are few more things.

InfiniBand

By default, the openMPI installed under /share/apps does not use InfiniBand to communicate between nodes. In order to do so, we need to pass mpirun with some Modular Component Architecture (MCA) parameters [1]. So in order to use InfiniBand, we set MCA parameter BTL (point-to-point byte movement)[2] to self,openib indicating use the "openib" and "self" BTLs.

mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM

Reporting problem

Inside and outside PBS

During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error:

[compute-0-14:10577] *** An error occurred in MPI_Allreduce
[compute-0-14:10577] *** reported by process [47112135966721,47111496269824]
[compute-0-14:10577] *** on communicator MPI_COMM_WORLD
[compute-0-14:10577] *** MPI_ERR_IN_STATUS: error code in status
[compute-0-14:10577] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[compute-0-14:10577] ***    and potentially your MPI job)

However, the exactly the same command could be successfully executed in command lines.

$ mpirun -mca btl self,openib -np 2 -npernode 1 --hostfile nodefile /home/kevin/opt/gromacs-5.1.2-MPI-single/bin/gmx_mpi mdrun -v -ntomp 16 -pin on -s W50k.tpr -deffnm output/g-testcomp14-np2
$ cat nodefile
compute-0-14
compute-0-5

References

  1. Jump up https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php
  2. Jump up https://docs.oracle.com/cd/E19708-01/821-1319-10/mca-params.html