Category: Linux


Puppet, xCAT and a big HPC Cluster

No Gravatar

I’m working on automating and upgrading the Corvus High Performance Computer at my employer eRSA / Uni of Adelaide.
It’s currently running Novell SLES 10, so I’m working on getting it up to SLES 11, which is coming along quite nicely. Part of my upgrade plan is to implement a better installation method along with automating the configuration of the compute nodes in the cluster.

Corvus has 75 compute nodes and a single head node. The compute nodes are SGI Altix XE310′s which are largely a generic supermicro chassis with and intel server thrown inside. The trick is that a single 1U chassis has two physical nodes in it, making it an ultra dense solution. Each node has a standard intel BMC which supports IMPISH.

The existing automated installation system for installing the nodes is the Scali Cluster Manager. Scali has been gobbled up by Platform Computing. This seems to mean big bucks to upgrade our Scali license. As a result I’ve replaced the existing Scali Cluster Manager with the opensource solution from IBM called xCAT.
xCAT is really good and handles the scaling up of the cluster really well. The include post install scripts cover most requirements and writing my own has been fairly straight forward. So now I have a standardised way to install the nodes, I need a way to automate the configuration.

To automate the configuration I’m going to be using Puppet. I created a small test system using some virtualbox guests and it worked great out the box. I’ve bought this book from Apress, which has only recently been released. It’s really good and I’d highly recommend it. http://apress.com/book/view/1590599780

In the end I should have an excellent and scalable way to install and manage the nodes.

No Gravatar

I’ve had a mobile me account with Apple for over a year now. I signed up to it shortly after getting my iPhone. I love using the service, even though it may have had a bit of a rocky start when it was launched. My me.com email address is solid, I also love the way they have implemented the gallery, beats the living heck out of the php based gallery2 I was living with (plus I don’t have to manage it and update the code :p). The calendar works great and syncs up with my iPhone.

All in all it covers everything I need to live online. However I was partly annoyed that the only way I could access my me.com iDisk was via a web interface, where I would rather like to be able to mount it as a disk. Not being able to mount my iDisk prompted me to signup for a jungle disk account from Rackspace which has great support for Linux, Windows etc. Turns out that much has changed in the land of the iDisk and with the release of Mac OS X 10.6, iDisk support was bundled into the OS. The mobile me widget / applet for Windows also does the same thing (although how the heck you configure it is beyond me) and allows you to mount the iDisk. What about Linux….!

However after digging around the net, I’ve found a blog post from Chris Danielson here http://tinyurl.com/2anfbpk about mounting an iDisk using the fuse davfs2 module.
I could give this man a carton of beers for the post :)

So I feel a bit silly not doing more research before jumping onto Jungle Disk, as my iDisk is accessible the way I want it, after all. It’s only costing me about $2.50 a month for the Jungle Disk account, but my me.com disk is already paid for and I’m not using it….
I also had a look at dropbox, which seems like a great service, but appears to be overkill for my needs at the moment.
I really only use my online storage as a backup and an easy way to get to my documents etc when I’m out and about on my iPhone.

So while I have a Jungle Disk, I think I may drop it and stick to my me.com iDisk and save a whole $2 :p

lm-sensors w83627ehf problem on Centos 5

No Gravatar

I’ve got a ASUS P5KPL-CM motherboard I’ve installed into a new server at work that will perform tape backups.
I wanted to install lm_sensors to monitor the temperature of the CPU and System.
I installed the standard Centos 5.2 release of lm_sensors, which at the time of writing was, version 2.10.0-3.1. I found however that due to the “newness” of the board, this version of lm_sensors was too old.

I removed the Centos lm_sensors package and wandered over to the lm_sensors site. I downloaded lm_sensors 3.0.2, here’s the link. http://dl.lm-sensors.org/lm-sensors/releases/lm_sensors-3.0.2.tar.bz2

Happily it compiled and installed fine using the standard install instructions, extract, make, make install. I ran sensors-detect and everything seemed to run ok.
The sensors-detect summary looked like this:

Now follows a summary of the probes I have just done.
Just press ENTER to continue:

Driver `w83627ehf’ (should be inserted):
Detects correctly:
* ISA bus, address 0×290
Chip `Winbond W83627DHG Super IO Sensors’ (confidence: 9)

Driver `coretemp’ (should be inserted):
Detects correctly:
* Chip `Intel Core family thermal sensor’ (confidence: 9)

However on running sensors it came back with:
Starting lm_sensors: No sensors found!
Make sure you loaded all the kernel drivers you need.
Try sensors-detect to find out which these are.

But… BUT, you said you where happy!
I went looking through the /lib/modules to try and find what it was loading, specifically the w83627ehf module. The specific directory is /lib/modules/<kernel version>/kernel/drivers/hwmon. On running insmod w83627ehf, it returns:

insmod: error inserting ‘w83627ehf.ko’: -1 No such device

After looking through lots of sites, I found this on the lm-sensors mailing-list:
http://lists.lm-sensors.org/pipermail/lm-sensors/2006-December/018519.html

Basically, seems Winbond released a revised version of the chip, so even thought the Centos 5.2 w83627ehf.ko module has the right name, the guts of the kernel module don’t support the new revised w83627ehf chip. I’m running 2.6.18-92.1.10.el5 (latest Centos 5 kernel at the moment). Turns out 2.6.21 has the updated w83627ehf kernel module that fixes this problem.

So I went to kernel.org and download the 2.6.21 source. Opened it up and copied out the w83627ehf.c file. You also need the lm75.h header file. I compiled the module and it inserted cleanly into 2.6.18-92.1.10. Ran sensors and bang it worked :)
Hows that for kernel hacking…

If you just want the compiled kernel module that works on 2.6.18-92.1.10 X86_64, download this archive: http://www.nodeofcrash.com/download/w83627ehf-centos5.tar.gz

There is also a kit tarball you can use to compile your own module for other Kernels.
Download this archive: http://www.nodeofcrash.com/download/w83627ehf-centos5-kit.tar.gz

I hope this saves someone a headache :)


Thank the lord for backups!

No Gravatar

I had a situation today where one of the engineers needed me to recover some files he had deleted by accident.
I have a one day old rsync backup on a hard disk as part of the backup system. However turns out the file was removed a week ago… yikes.

Got the tapes out and fired up amanda recover (amrecover).
On running amrecover and issuing the “extract” command,  amrecover failed with “unable to open socket : Success”.
I ran amcleanup and then the amrecover again and everything came right. Extracted the files and got a free coffee from the engineer involved :)

It’s good to know I have Amanda looking after me :)

Argh Hardware Upgrade Pains….

No Gravatar

Roughly two weeks ago I upgraded my home pc kit.
Bought myself a Core 2 Duo 3.16Ghz, 4 Gb Ram and a Gigabyte P45-DSP3 motherboard.
I was quite surprised to find that after changing from AMD to Intel architecture, Vista actually booted up.
Ubuntu 8.04 Hardy Heron on the other had became completely Foobar. I found that with the newer Intel chipsets you need to run the hard drive controllers in AHCI mode. Just a setting change in the Bios.
However changing to AHCI mode from normal mode messes up Vista.

So I decided it was time to re-install. My Vista was needing a clean anyway. Install went fine and I got everything up and running. Took an image (with Paragon Drive Backup) of the clean installed Vista. It should save time with the next re-install.

With Vista up and running, I got the Ubuntu disk out. Everything looked ok, but after completing the install and rebooting, things got strange. Booting would get as far as the grub screen and on selecting the kernel, complain about being unable to mount the boot partition. I tried reinstalling Ubuntu again, but same problem. Next I got my trusty Knoppix disk out. Booted that up and preformed a re-install of grub. Everything looked ok, but on a reboot, same issue with being unable to mount /boot.
Fiddled with the grub boot parameters but still no joy.

Frustrated, I thought I would try another Distro to see if it had the same issue. I got my Fedora 8 disk out, but decided it might be a bit old for the motherboard and CPU I was using, so I booted the Fedora 9 installer. Unfortunately the Gui installer locks up after selecting the language. Switching between the terminals in anaconda (Alt+F1, Alt+F2 etc) revealed nothing useful.
I’m guessing it’s my graphics card causing the issue. I’ve got a Nvidia 9600GT.
I could have tried the text installer, but I left Fedora 9 alone.

Next tried Fedora 8. After fiddling with the installer kernel options, I got it going,
However it locks up at the same place as Fedora 9 oddly enough.

Lastly out of desperation, I got the OpenSuse 10.1 disk out. Suse came though as the mighty Yast sorted out all my issues.
Booted up Suse, but encountered a problem with the dual Gigabit network cards on the motherboard. They were being detected, however I couldn’t get DHCP to work and static IP addresses didn’t help either. I couldn’t ping anything on my network.

At this point I had to go to sleep so I left my Linux troubles for the time being. A couple of days later I decided my network troubles must be Suse related, so I had another go at getting Ubuntu back up.
After re-installing Ubuntu, I cracked the Grub problem. It appears that what the Bios is reporting as the boot order of the drives it not the same as what Grub is picking up. Changing the root (hd2,0) to root (hd0,0) brought my Ubuntu up.

However Ubuntu seems to suffer from the same network card trouble as Suse :(
I guess there is going to be some more work needed to get this sorted….

Powered by WordPress | Theme: Motion by 85ideas.