I’m working on automating and upgrading the Corvus High Performance Computer at my employer eRSA / Uni of Adelaide.
It’s currently running Novell SLES 10, so I’m working on getting it up to SLES 11, which is coming along quite nicely. Part of my upgrade plan is to implement a better installation method along with automating the configuration of the compute nodes in the cluster.
Corvus has 75 compute nodes and a single head node. The compute nodes are SGI Altix XE310’s which are largely a generic supermicro chassis with and intel server thrown inside. The trick is that a single 1U chassis has two physical nodes in it, making it an ultra dense solution. Each node has a standard intel BMC which supports IMPISH.
The existing automated installation system for installing the nodes is the Scali Cluster Manager. Scali has been gobbled up by Platform Computing. This seems to mean big bucks to upgrade our Scali license. As a result I’ve replaced the existing Scali Cluster Manager with the opensource solution from IBM called xCAT.
xCAT is really good and handles the scaling up of the cluster really well. The include post install scripts cover most requirements and writing my own has been fairly straight forward. So now I have a standardised way to install the nodes, I need a way to automate the configuration.
To automate the configuration I’m going to be using Puppet. I created a small test system using some virtualbox guests and it worked great out the box. I’ve bought this book from Apress, which has only recently been released. It’s really good and I’d highly recommend it. http://apress.com/book/view/1590599780
In the end I should have an excellent and scalable way to install and manage the nodes.