No Gravatar

I’m currently testing out Lustre 2.1 on the HPC system I spoke about here on google+

Initially I stuck with the vanilla CentOS kernel on the nodes and installed the Qlogic OFED bundle to get the Infiniband and MPI packages.
This worked well. Later though I decided to install the Lustre Kernel and packages from Whamcloud to test Lustre over Infiniband.
I simply dropped the kernel, kernel-devel and kernel-headers onto the nodes using rpm and reboot.
Everything seemed okay, however I noticed pretty poor performance from Lustre and later found that RDMA wasn’t working properly… yikes!

The way I determined RDMA was broken was through compiling and running the HPL 2.0 benchmark as used by Top500.org.
On starting mpirun, I would receive a cute error about not being able to initialize RDMA through the mlx4 driver (The Mellanox IB driver).
This caused me to scratch my head.

I laid the blame squarely at the bait and switch I performed with the Kernels.
After clearing out the Qlogic OFED packages and re-installing it so it compiled the kernel modules against the Lustre Kernel, everything worked as expected and RDMA seems to function once again.
<UPDATE>
However it appears the lustre networking kernel modules in 2.1.1 don’t like the OFED! See this post about it
http://www.nodeofcrash.com/?p=513

The moral of the story is CentOS 6 Vanilla Kernel <> CentOS 6 Lustre Kernel, so don’t assume the kernel modules will port over.

 

« »