Difference between revisions of "OpenMPI and InfiniBand"
From Computational Biophysics and Materials Science Group
Line 10: | Line 10: | ||
mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM | mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM | ||
+ | |||
+ | == Reporting problem == | ||
+ | |||
+ | === Inside and outside PBS === | ||
+ | |||
+ | During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error: | ||
+ | [compute-0-14:10577] *** An error occurred in MPI_Allreduce | ||
+ | [compute-0-14:10577] *** reported by process [47112135966721,47111496269824] | ||
+ | [compute-0-14:10577] *** on communicator MPI_COMM_WORLD | ||
+ | [compute-0-14:10577] *** MPI_ERR_IN_STATUS: error code in status | ||
+ | [compute-0-14:10577] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, | ||
+ | [compute-0-14:10577] *** and potentially your MPI job) | ||
+ | |||
+ | However, the exactly the same command could be successfully executed in command lines. | ||
+ | $ mpirun -mca btl self,openib -np 2 -npernode 1 --hostfile nodefile /home/kevin/opt/gromacs-5.1.2-MPI-single/bin/gmx_mpi mdrun -v -ntomp 16 -pin on -s W50k.tpr -deffnm output/g-testcomp14-np2 | ||
+ | $ cat nodefile | ||
+ | compute-0-14 | ||
+ | compute-0-5 | ||
==References== | ==References== | ||
{{Reflist}} | {{Reflist}} |
Revision as of 19:17, 17 May 2016
We have OpenMPI installed on Combo, you can trigger MPI enabled program theoretically by
mpirun -np NUM_OF_PROCESSES PROGRAM
but there are few more things.
InfiniBand
By default, the openMPI installed under /share/apps does not use InfiniBand to communicate between nodes. In order to do so, we need to pass mpirun with some Modular Component Architecture (MCA) parameters [1]. So in order to use InfiniBand, we set MCA parameter BTL (point-to-point byte movement)[2] to self,openib indicating use the "openib" and "self" BTLs.
mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM
Reporting problem
Inside and outside PBS
During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error:
[compute-0-14:10577] *** An error occurred in MPI_Allreduce [compute-0-14:10577] *** reported by process [47112135966721,47111496269824] [compute-0-14:10577] *** on communicator MPI_COMM_WORLD [compute-0-14:10577] *** MPI_ERR_IN_STATUS: error code in status [compute-0-14:10577] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [compute-0-14:10577] *** and potentially your MPI job)
However, the exactly the same command could be successfully executed in command lines.
$ mpirun -mca btl self,openib -np 2 -npernode 1 --hostfile nodefile /home/kevin/opt/gromacs-5.1.2-MPI-single/bin/gmx_mpi mdrun -v -ntomp 16 -pin on -s W50k.tpr -deffnm output/g-testcomp14-np2 $ cat nodefile compute-0-14 compute-0-5