I’m currently testing out Lustre 2.1 on the HPC system I spoke about here on google+
Initially I stuck with the vanilla CentOS kernel on the nodes and installed the Qlogic OFED bundle to get the Infiniband and MPI packages.
This worked well. Later though I decided to install the Lustre Kernel and packages from Whamcloud to test Lustre over Infiniband.
I simply dropped the kernel, kernel-devel and kernel-headers onto the nodes using rpm and reboot.
Everything seemed okay, however I noticed pretty poor performance from Lustre and later found that RDMA wasn’t working properly… yikes!
The way I determined RDMA was broken was through compiling and running the HPL 2.0 benchmark as used by Top500.org.
On starting mpirun, I would receive a cute error about not being able to initialize RDMA through the mlx4 driver (The Mellanox IB driver).
This caused me to scratch my head.
I laid the blame squarely at the bait and switch I performed with the Kernels.
After clearing out the Qlogic OFED packages and re-installing it so it compiled the kernel modules against the Lustre Kernel, everything worked as expected and RDMA seems to function once again.
However it appears the lustre networking kernel modules in 2.1.1 don’t like the OFED! See this post about it
The moral of the story is CentOS 6 Vanilla Kernel <> CentOS 6 Lustre Kernel, so don’t assume the kernel modules will port over.