Difference between revisions of "OpenMPI and InfiniBand"
From Computational Biophysics and Materials Science Group
Line 13: | Line 13: | ||
== Reporting problem == | == Reporting problem == | ||
− | === | + | === Some nodes not working in PBS === |
During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error: | During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error: |
Revision as of 19:18, 17 May 2016
We have OpenMPI installed on Combo, you can trigger MPI enabled program theoretically by
mpirun -np NUM_OF_PROCESSES PROGRAM
but there are few more things.
InfiniBand
By default, the openMPI installed under /share/apps does not use InfiniBand to communicate between nodes. In order to do so, we need to pass mpirun with some Modular Component Architecture (MCA) parameters [1]. So in order to use InfiniBand, we set MCA parameter BTL (point-to-point byte movement)[2] to self,openib indicating use the "openib" and "self" BTLs.
mpirun -mca btl self,openib -np NUM_OF_PROCESSES PROGRAM
Reporting problem
Some nodes not working in PBS
During benchmarking of Gromacs, compute-0-14 has been found fail to access if submitted by PBS scripts. Error:
[compute-0-14:10577] *** An error occurred in MPI_Allreduce [compute-0-14:10577] *** reported by process [47112135966721,47111496269824] [compute-0-14:10577] *** on communicator MPI_COMM_WORLD [compute-0-14:10577] *** MPI_ERR_IN_STATUS: error code in status [compute-0-14:10577] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [compute-0-14:10577] *** and potentially your MPI job)
However, the exactly the same command could be successfully executed in command lines.
$ mpirun -mca btl self,openib -np 2 -npernode 1 --hostfile nodefile /home/kevin/opt/gromacs-5.1.2-MPI-single/bin/gmx_mpi mdrun -v -ntomp 16 -pin on -s W50k.tpr -deffnm output/g-testcomp14-np2 $ cat nodefile compute-0-14 compute-0-5