Issue Tracker

Share your bug reports, feature requests, and general feedback/issues with us. Our team is ready to hear from you to improve fluid-slurm-gcp!

Bug : OpenMPI, MVAPICH "No IB device found"

Affected Products : fluid-slurm-gcp+openhpc ( v2.2.1 )

Workarounds : Use mpich as the MPI provider ( module load mpich )

Expected fix : Image upgrades for v2.3.x (May 2020) will provide custom builds for MVAPICH and OpenMPI rather than the ohpc packages.

Description : On fluid-slurm-gcp+openhpc, OpenMPI and MVAPICH provided via module load openmpi3 or module load mvapich2 fail to launch with mpirun , resulting in the following error:

[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=2145167:
system msg for write_line failure : Bad file descriptor
[mpi-compute-16-1:mpi_rank_0][MPIDI_CH3_Abort] Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490)............: 
MPID_Init(396)...................: channel initialization failed
MPIDI_CH3_Init(410)..............: rdma_get_control_parameters
rdma_get_control_parameters(1726): 
rdma_open_hca(575)...............: No IB device found
: Bad file descriptor (9)
-------------------------------------------------------------------------- 
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------