Skip to main content

Setting up a ZFS filesystem on Linux

When installing a new Linux system, typically your default filesystem will be EXT4 or perhaps even XFS, which is the newer filesystem that most recent distributions are starting to use. But when you want to store a large amount of data, you may want to use something more robust. The ZFS filesystem is becoming all the rage, and for good reason. First invented by Sun Microsystems, ZFS provides a lot of cool features:

  • Data integrity: ZFS will silently use checksum values to make sure there is no degradation of data over time, whether from phantom writes, spikes in the hardware current, or silent data corruption.
  • Built-in raid: ZFS provides several raid types without the need of a hardware raid card.
  • Large capacity: With ZFS pools, you can extend multiple hard drives into a single logical volume.
  • Efficiency: ZFS allows you to use cache and log disks to speed up read or write access.

Installing ZFS

The first thing to do in order to use ZFS is to install the kernel driver. You can find out if your system already supports it by checking whether the module is loaded:

lsmod |grep zfs

If nothing comes up, you can add the ZFS repository and then install it:

yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm
yum install zfs

After rebooting, you can list modules again and it should be loaded. If not, try loading it manually with modprobe zfs and that should do the trick.

Note: In some distributions, the ZFS features are provided in a user space daemon called zfs-fuse instead of a kernel module. To enable ZFS support in that case, all you have to do is:

systemctl start zfs-fuse
systemctl enable zfs-fuse

Creating a zPool

In our experiment, we will create a pool out of 3 physical 2TB hard drives:

We’ll create a pool called storage from the 3 disks using the Raid-Z1 setting:

<bash>
zpool create storage raidz /dev/sdb /dev/sdc /dev/sdd

That’s all it takes. Now you can see your new pool with the zpool list command:

There are several types of pools you can make:

  • disk: This just stripes the data across disks, the equivalent of Raid-0.
  • mirror: This is just a mirror of disks. It’s the equivalent of Raid-1.
  • raidz: This is Raid-Z1 and provides 1 disk of redundancy.
  • raidz2: This is Raid-Z2 and provides 2 disks of redundancy.
  • raidz3: This is Raid-Z3 and provides 3 disks of redundancy.

You can also have hybrids, which is useful if you want to add additional disks to an existing pool.

Since we used Raid-Z1 on 3 disks of 2TB each, for a total of 6TB, that means we lose the space of 1 disk for parity, and should end up with 4TB of usable space. Indeed, if we use the df command we can see the new 4TB volume has been mounted: