CentOS root on ZFS

I have bought some Infiniband cards to link the fileserver and the hypervisor together. I chose Infiniband over 10GbE because it is cheaper and offers me more bandwidth, but more on that in a later post. I have traditionally run FreeBSD for my fileserver because it has had ZFS for quite a while and I want to have a proven implementation of my filesystem on my fileserver.

However, when running iSCSI over Infiniband, there are two ways: IP over Infiniband or an iSCSI extension called iSER. The first method imposes quite some overhead, somebody reported 20× the bandwidth and double in IOPS when running with iSER instead of IPoIB. The problem: targets with iSER are not (yet) supported on FreeBSD 11, however acting as an initiator works. So I needed to switch to ZFS on Linux as LIO support iSER targets.

As I use CentOS on all the other servers, I wanted to run my fileserver on CentOS as well. I really liked the Root on ZFS install that is integrated with FreeBSD, I wanted to replicated it with CentOS 7. The original tutorial did not really work out for me, I liked the Ubuntu one a bit better - so I cherrypicked from both and here is the result.

My root zpool is called zroot and is a mirror over two SSDs. Start by CentOS to another medium and installing zfs and zfs-dracut as described in the guide. I opted to use mbr and to ditch the UEFI stuff. As I wanted to create a swap partition on each SSD, I created a 8GB partition in the end and used the rest for ZFS.

  1. My partition table:
    sfdisk -d /dev/sda
    # partition table of /dev/sda
    unit: sectors
    /dev/sda1 : start=     2048, size=217662383, Id=bf, bootable
    /dev/sda2 : start=        0, size=        0, Id= 0
    /dev/sda3 : start=        0, size=        0, Id= 0
    /dev/sda4 : start=217664431, size= 16777217, Id=82
  2. First create the zpool:
    zpool create -d \
      -o feature@async_destroy=enabled \
      -o feature@empty_bpobj=enabled \
      -o feature@lz4_compress=enabled \
      -o ashift=12 -O atime=off -O canmount=off \
      -O mountpoint=/ -R /media -O compression=lz4 \
      zroot mirror /dev/sda1 /dev/sdb1

    Note: It is important to create the pool with -d, otherwise grub2-probe cannot detect the filesystem correctly as ZFS.

  3. Check that pool was created:
      zpool status zroot
      pool: zroot
     state: ONLINE
    status: Some supported features are not enabled on the pool. The pool can
      still be used, but some features are unavailable.
    action: Enable all features using 'zpool upgrade'. Once this is done,
      the pool may no longer be accessible by software that does not support
      the features. See zpool-features(5) for details.
      scan: none requested
      NAME            STATE     READ WRITE CKSUM
      zroot           ONLINE       0     0     0
        mirror-0      ONLINE       0     0     0
          sda1        ONLINE       0     0     0
          sdb1        ONLINE       0     0     0
    errors: No known data errors
  4. Lazy variant to get good permanent device names:
    zpool export zroot
    zpool import -o altroot=/media -d /dev/disk/by-path/ zroot
  5. Lets create and mount some ZFS filesystems for different parts of the system
    zfs create -o canmount=off -o mountpoint=none zroot/ROOT
    zfs create -o mountpoint=/ zroot/ROOT/centos
    zfs create -o setuid=off zroot/home
    zfs create -o mountpoint=/root zroot/home/root
    zfs create -o canmount=off -o setuid=off -o exec=off zroot/var
    zfs create zroot/var/cache
    zfs create zroot/var/log
    zfs create zroot/var/spool
    zfs create -o exec=on zroot/var/tmp
    zfs create zroot/var/mail
    zfs mount -a
    chmod 1777 /media/var/tmp
  6. Everything that is not stored in one of the filesystems, is in zroot/ROOT/centos. Making going back and forth between snapshots easier as all the logfiles and stuff will be kept. If someone want to have another root (try something), one can simply switch the mountpoints and be good to go…
  7. Now it is time to copy the system from the original install to the zroot.
    mount --bind / /mnt/
    rsync -avPX /mnt/. /media/.
    # Next two steps only if you have /boot and /home on a different partition
    rsync -avPX /boot/. /media/boot/
    rsync -avPX /home/. /media/home/.
    umount /mnt/tmp/
  8. Final system adoption in the new system, but first prepare the chroot environment.
    mount --rbind /dev /media/dev/
    mount --rbind /sys /media/sys/
    mount --rbind /proc /media/proc/
    chroot /media
  9. Edit fstab, remove all entries and add swap partitions.
    vi /etc/fstab
  10. Update grub config /etc/default/grub and add/ update the following lines:
    GRUB_CMDLINE_LINUX="nofb splash=quiet crashkernel=auto rhgb quiet boot=zfs rpool=zroot bootfs=zroot/ROOT/centos"
    GRUB_PRELOAD_MODULES="part_msdos zfs"
  11. Make an ugly hack because apparently the grub utilities cannot handle persistent device names very well…
    pushd /dev
    ln -s /dev/disk/by-path/* . # Use right directory for you zpool device names
  12. Rebuild and install grub on the root SSDs:
    grub2-mkconfig -o /boot/grub2/grub.cfg
    grub2-install /dev/sda
    grub2-install /dev/sdb
  13. Enable the zfs-mount service otherwise, CentOS will only mount the zroot/ROOT/centos filesystem.
    systemctl enable zfs-mount.service
  14. Update dracut config in /etc/dracut.conf
  15. Rebuild initramfs
    dracut -f -v /boot/initramfs-$(uname -r).img $(uname -r)
  16. Do anything else you want, then reboot. And hope for the best. If things should go south, boot from the original medium and troubleshoot or start over by destroying the zpool.