ZFS on Linux
Contents
READ FIRST
Some considerations when working with ZFS
- ZFS uses vdevs and not physical disks.
- Be careful about how you add new disks to the array. No random adding and removing of disks (exception being when upgrading disks or a disk fails)
- ZFS is very powerful, be mindful of what you are going to do and plan it out!
- After a vdev is created, it can never be removed and you can not add into it.
Example:
NAME STATE READ WRITE CKSUM pool4tb ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdd ONLINE 0 0 0
radz1-0 is a vdev. To add more disks (other than hotspares) you must create a second vdev. In this case we are running two mirrored drives so it would be best to add a second pair of mirrored drives.
NAME STATE READ WRITE CKSUM pool4tb ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdd ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0
Now data will be striped across both vdevs.
ZFS on Linux Installation
It has been reported that when installing zfs and it's dependencies at the same time, the kernel modules will not get created. Below are the current steps I found to work when installing ZFS.
yum -y install epel-release
Make sure the system is completely up to date.
yum -y update reboot -h
After reboot
yum -y install kernel-devel yum -y localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm yum -y install spl
If everything was done right, the following command will take a while (depending on hardware)
yum -y install zfs-dkms yum -y install zfs /sbin/modprobe zfs
Fedora 28
[1]The instructions from the zfsonlinux.org site are correct, except for enabling the repo before installing. Even issuing the "dnf --set-enable zfs.repo" would result in failure. Had to edit the repo file directly (/etc/yum.repos.d/zfs.repo) to enable. Not a big deal, but something good to know.
Create ZFS Pool
At this point you can create your pool. Most of the time we will be interested in a ZRAID configuration. Depending on how much parity your interested user raidz, raidz1, raidz2, or raidz3.
zpool create <name of pool> raidz <disk1> <disk2> <etc>
NOTE: By default this will create a mount point of "/<name of pool>"
To add a spare drive
zpool add <name of pool> spare <disk>
Make sure to enable automatic rebuild when a drive fails, especially when using hot spares.
zpool autoreplace=on <name of pool>
I ran into the following that would help with managing the disks. Creating a label for each disk would have saved me time in the past[2]
# glabel label rex1 ada0 # glabel label rex2 ada1 # glabel label rex3 ada2 # zpool create rex raidz1 label/rex1 label/rex2 label/rex3
Create ZFS Volumes
zfs create <name of pool>/<Volume Name> zfs set mountpoint=<mount point>
Example:
zfs create pool4tb/archive mkdir /archive zfs set mountpoint=/archive pool4tb/archive
Additional Options
To enable compression
zfs set compression=lz4 <name of pool>
To increase the number of copies of a file on a dataset
zfs set copies=<1,2,3>
To have the pool auto-expand
zpool set autoexpand=on <name of pool>
- Encryption
http://www.makethenmakeinstall.com/2014/10/zfs-on-linux-with-luks-encrypted-disks/
EXAMPLE
1x2TB HDD sdb 4x1TB HDDs sdc sdd sde sdf Using the above drives it is possible to create a variety of deployments. In this example we will create a RAID5 like configuration that spans across three 2TB devices. We start by creating a pools and adding the drives. [root@nas ~]# zpool create -f set1 raidz /dev/sdc /dev/sdd [root@nas ~]# zpool create -f set2 raidz /dev/sde /dev/sdf [root@nas ~]# zpool status pool: set1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM set1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 errors: No known data errors pool: set2 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM set2 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 errors: No known data errors [root@nas ~]# zfs create -V 1.50T set1/vdev1 [root@nas ~]# zfs create -V 1.50T set2/vdev1 [root@nas ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT set1 1.55T 214G 57.5K /set1 set1/vdev1 1.55T 1.76T 36K - set2 1.55T 214G 57.5K /set2 set2/vdev2 1.55T 1.76T 36K - [root@nas ~]# ls /dev/ <condensed output> zd0 zd16 [root@nas ~]# zpool create -f data raidz1 /dev/sdb /dev/zd0 /dev/zd16 [root@nas ~]# zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data 4.47T 896K 4.47T - 0% 0% 1.00x ONLINE - set1 1.81T 742K 1.81T - 0% 0% 1.00x ONLINE - set2 1.81T 429K 1.81T - 0% 0% 1.00x ONLINE - [root@nas ~]# df -lh Filesystem Size Used Avail Use% Mounted on /dev/sda3 33G 1.6G 32G 5% / devtmpfs 3.8G 0 3.8G 0% /dev tmpfs 3.8G 0 3.8G 0% /dev/shm tmpfs 3.8G 8.5M 3.8G 1% /run tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup /dev/sda1 497M 200M 298M 41% /boot tmpfs 775M 0 775M 0% /run/user/0 set1 214G 128K 214G 1% /set1 set2 214G 128K 214G 1% /set2 data 2.9T 256K 2.9T 1% /data
As you can see there is a LOT of wasted space using this method. Where we should have ~4TB of usable space we end with ~3TB. This was only an example, the better option is to create multiple independent data sets.
ZFS Send
Example of using ZFS send to replicate snapshots from a local pool to a local external drive.
nohup zfs send -R tank/datastore@auto-20180629.0000-2w | zfs recv -F backuppool/backup &
Incremental [3]
zfs send -R -i tank/datastore@auto-20180630.0000-2w tank/datastore@auto-20180701.0000-2w | zfs recv -F backuppool/backup
Troubleshooting
[4]There is a cache file that is used for mounting ZFS at boot. Make sure to run the following if ZFS is not importing on boot.
[root@nas ~]# systemctl enable zfs-import-cache.service
Hot Spare
I had an issue where I created a raidz2 pool without spares (which is fine for this deployment). A drive failed, and I installed a replacement as a spare using the FreeNAS gui (this one was not ZFSoL). I was then stuck with a perpetually degraded pool.
pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/ca363e73-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/cca8828b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d1b86990-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d51049fe-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d804819b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/625287ff-7c6b-11e8-a699-002590fde644 ONLINE 0 0 0 gptid/dda24b58-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 spare-9 DEGRADED 0 0 0 17637264324123775223 OFFLINE 0 0 0 was /dev/gptid/e55c7104-5d4d-11e8-aaf6-002590fde644 gptid/051c5d74-612e-11e8-8357-002590fde644 ONLINE 0 0 0 gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 logs gptid/bdafc060-6ccc-11e8-8357-002590fde644 ONLINE 0 0 0 spares 227308045836062793 INUSE was /dev/gptid/051c5d74-612e-11e8-8357-002590fde644 errors: No known data errors
But if I would RTFM[5] I would know to detach the failed drive that I previously made offline.
root@freenas:~ # zpool detach tank 17637264324123775223 root@freenas:~ # zpool status pool: tank state: ONLINE scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/ca363e73-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/cca8828b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d1b86990-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d51049fe-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d804819b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/625287ff-7c6b-11e8-a699-002590fde644 ONLINE 0 0 0 gptid/dda24b58-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/051c5d74-612e-11e8-8357-002590fde644 ONLINE 0 0 0 gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 logs gptid/bdafc060-6ccc-11e8-8357-002590fde644 ONLINE 0 0 0 errors: No known data errors
ZFS Not Mounting After Reboot
For some reason my system stopped mounting my ZFS volumes at boot. For a year I would manually mount as needed (a reboot was rare). But now I found the issue[6]
systemctl enable zfs-import.target
FreeNAS Specific
Replace drive using CLI
Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk.
The following is what I did to obtain a gptid for the drive.[7]
- First I obtained the drive information. Had a local tech provide me the SN.
- Ran a script I wrote to pull SN from drives listed in /dev to obtain the correct device (/dev/da13)
- At this point I created the gpt partion on the disk using the steps from the reference above.
gpart create -s gpt da13 gpart add -t freebsd-ufs da13
- Then I checked to see if the disk showed up with a label.
glabel list | grep da13
- At which point I could found the label in the full list
glabel list
- Then started the replacement of the failed disk that I previously took offline.
zpool replace tank 17805351018045823548 gptid/625287ff-7c6b-11e8-a699-002590fde644 pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Jun 30 09:43:44 2018 1.58T scanned at 647M/s, 840G issued at 335M/s, 1.58T total 72.7G resilvered, 51.81% done, 0 days 00:39:48 to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/ca363e73-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/cca8828b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d1b86990-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d51049fe-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d804819b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 replacing-5 OFFLINE 0 0 0 17805351018045823548 OFFLINE 0 0 0 was /dev/gptid/db20f312-5d4d-11e8-aaf6-002590fde644 gptid/625287ff-7c6b-11e8-a699-002590fde644 ONLINE 0 0 0 (resilvering) gptid/dda24b58-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0
- ↑ https://github.com/zfsonlinux/zfs/wiki/Fedora
- ↑ https://forums.freebsd.org/threads/how-to-recover-degraded-zpool.28084/
- ↑ https://docs.oracle.com/cd/E19253-01/819-5461/gbchx/index.html
- ↑ http://serverfault.com/questions/732184/zfs-datasets-dissappear-on-reboot
- ↑ https://docs.oracle.com/cd/E19253-01/819-5461/gcvdi/index.html
- ↑ https://serverfault.com/questions/914173/zfs-datasets-no-longer-automatically-mount-on-reboot-after-system-upgrade
- ↑ https://mikebeach.org/2014/03/01/how-to-format-a-disk-gpt-in-freenas/