I’ve been testing the ZFS linux kernel module at work on a spare machine I have to get an idea on how stable it is.
After two weeks of using it to backup my main desktop, I’ve found it to be rock solid with good performance.
As a result I’ve taken the plunge and upgraded the main backup server from fuse-zfs to the ZFS kernel module. Read and Write performance have improved dramatically with the new ZFS kernel module. I also upgraded the filesystem from pool version 23 to 28 as a result.
The backup zfs filesystem contains about 3TB of data, with about 600 snapshots. The machine in question has two quad core Xeons and 16GB of RAM. The zfs filesystem is housed on 3 x 2TB hard disks with raidz.
Today I looked into backing up up 7 windows virtual guests using rsync to this ZFS filesystem and making daily snapshots. However each machine is 50GB, which totals 350GB, most of that is just the Windows core. As the machines are all cloned from a gold image, I decided to enable dedup on the dataset. The command is:
zfs set dedup=on zpool/dataset
I had thought it would make a reasonable difference to disk usage, however I was rather blown away by the result. Instead of 350GB, the rsync backups used 196GB. That’s a whopping 154GB saved though dedup.
I’ve decided to enable dedup as a test on my Linux backups to see what happens.
Remember that dedup is done on the zpool, so even when you only enable it on one dataset, the dedup metadata ZFS uses is available across the whole zpool.
One caveat would be that enabling dedup is going to consume more CPU and Memory, however on a machine with spare cores and plenty of memory, saving that disk space is worth it!