ZFS on Linux

From Michael's Information Zone
Jump to navigation Jump to search

READ FIRST

Some considerations when working with ZFS

  • ZFS uses vdevs and not physical disks.
  • Be careful about how you add new disks to the array. No random adding and removing of disks (exception being when upgrading disks or a disk fails)
  • ZFS is very powerful, be mindful of what you are going to do and plan it out!
  • After a vdev is created, it can never be removed and you can not add into it.

Example:

NAME        STATE     READ WRITE CKSUM
	pool4tb     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0

radz1-0 is a vdev. To add more disks (other than hotspares) you must create a second vdev. In this case we are running two mirrored drives so it would be best to add a second pair of mirrored drives.

NAME        STATE     READ WRITE CKSUM
	pool4tb     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	  raidz1-1  ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	    sdf     ONLINE       0     0     0

Now data will be striped across both vdevs.

ZFS on Linux Installation

It has been reported that when installing zfs and it's dependencies at the same time, the kernel modules will not get created. Below are the current steps I found to work when installing ZFS.

yum -y install epel-release

Make sure the system is completely up to date.

yum -y update
reboot -h

After reboot

yum -y install kernel-devel
yum -y localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum -y install spl

If everything was done right, the following command will take a while (depending on hardware)

yum -y install zfs-dkms
yum -y install zfs
/sbin/modprobe zfs

At this point you can create your pool. Most of the time we will be interested in a ZRAID configuration. Depending on how much parity your interested user raidz, raidz1, raidz2, or raidz3.

zpool create <name of pool> raidz <disk1> <disk2> <etc>

NOTE: By default this will create a mount point of "/<name of pool>"

To add a spare drive

zpool add <name of pool> spare <disk>

Make sure to enable automatic rebuild when a drive fails, especially when using hot spares.

zpool autoreplace=on <name of pool>

Create ZFS Volumes

zfs create <name of pool>/<Volume Name>
zfs set mountpoint=<mount point>

Example:

zfs create pool4tb/archive
mkdir /archive
zfs set mountpoint=/archive pool4tb/archive

Additional Options

To enable compression

zfs set compression=lz4 <name of pool>

To increase the number of copies of a file on a dataset

zfs set copies=<1,2,3>

To have the pool auto-expand

zpool set autoexpand=on <name of pool>
  • Encryption

http://www.makethenmakeinstall.com/2014/10/zfs-on-linux-with-luks-encrypted-disks/

EXAMPLE

1x2TB HDD sdb
4x1TB HDDs sdc sdd sde sdf

Using the above drives it is possible to create a variety of deployments. In this example we will create a RAID5 like configuration that spans across three 2TB devices.


We start by creating a pools and adding the drives.
[root@nas ~]# zpool create -f set1 raidz /dev/sdc /dev/sdd
[root@nas ~]# zpool create -f set2 raidz /dev/sde /dev/sdf
[root@nas ~]# zpool status
  pool: set1
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        set1        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0

errors: No known data errors

  pool: set2
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        set2        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors

[root@nas ~]# zfs create -V 1.50T set1/vdev1
[root@nas ~]# zfs create -V 1.50T set2/vdev1
[root@nas ~]# zfs list
NAME         USED  AVAIL  REFER  MOUNTPOINT
set1        1.55T   214G  57.5K  /set1
set1/vdev1  1.55T  1.76T    36K  -
set2        1.55T   214G  57.5K  /set2
set2/vdev2  1.55T  1.76T    36K  -

[root@nas ~]# ls /dev/
<condensed output>
zd0
zd16

[root@nas ~]# zpool create -f data raidz1 /dev/sdb /dev/zd0 /dev/zd16
[root@nas ~]# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data  4.47T   896K  4.47T         -     0%     0%  1.00x  ONLINE  -
set1  1.81T   742K  1.81T         -     0%     0%  1.00x  ONLINE  -
set2  1.81T   429K  1.81T         -     0%     0%  1.00x  ONLINE  -

[root@nas ~]# df -lh
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        33G  1.6G   32G   5% /
devtmpfs        3.8G     0  3.8G   0% /dev
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           3.8G  8.5M  3.8G   1% /run
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup
/dev/sda1       497M  200M  298M  41% /boot
tmpfs           775M     0  775M   0% /run/user/0
set1            214G  128K  214G   1% /set1
set2            214G  128K  214G   1% /set2
data            2.9T  256K  2.9T   1% /data

As you can see there is a LOT of wasted space using this method. Where we should have ~4TB of usable space we end with ~3TB. This was only an example, the better option is to create multiple independent data sets.

ZFS Send

Example of using ZFS send to replicate snapshots from a local pool to a local external drive.

nohup zfs send -R tank/datastore@auto-20180629.0000-2w | zfs recv -F backuppool/backup &

Troubleshooting

[1]There is a cache file that is used for mounting ZFS at boot. Make sure to run the following if ZFS is not importing on boot.

[root@nas ~]# systemctl enable zfs-import-cache.service

Hot Spare

I had an issue where I created a raidz2 pool without spares (which is fine for this deployment). A drive failed, and I installed a replacement as a spare using the FreeNAS gui (this one was not ZFSoL). I was then stuck with a perpetually degraded pool.

  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018
config:

	NAME                                              STATE     READ WRITE CKSUM
	tank                                              DEGRADED     0     0     0
	  raidz2-0                                        DEGRADED     0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/625287ff-7c6b-11e8-a699-002590fde644    ONLINE       0     0     0
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    spare-9                                       DEGRADED     0     0     0
	      17637264324123775223                        OFFLINE      0     0     0  was /dev/gptid/e55c7104-5d4d-11e8-aaf6-002590fde644
	      gptid/051c5d74-612e-11e8-8357-002590fde644  ONLINE       0     0     0
	    gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	logs
	  gptid/bdafc060-6ccc-11e8-8357-002590fde644      ONLINE       0     0     0
	spares
	  227308045836062793                              INUSE     was /dev/gptid/051c5d74-612e-11e8-8357-002590fde644

errors: No known data errors

But if I would RTFM[2] I would know to detach the failed drive that I previously made offline.

root@freenas:~ # zpool detach tank 17637264324123775223
root@freenas:~ # zpool status

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/625287ff-7c6b-11e8-a699-002590fde644  ONLINE       0     0     0
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/051c5d74-612e-11e8-8357-002590fde644  ONLINE       0     0     0
	    gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	logs
	  gptid/bdafc060-6ccc-11e8-8357-002590fde644    ONLINE       0     0     0

errors: No known data errors

FreeNAS Specific

Replace drive using CLI

Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk.

The following is what I did to obtain a gptid for the drive.[3]

  • First I obtained the drive information. Had a local tech provide me the SN.
  • Ran a script I wrote to pull SN from drives listed in /dev to obtain the correct device (/dev/da13)
  • At this point I created the gpt partion on the disk using the steps from the reference above.
gpart create -s gpt da13
gpart add -t freebsd-ufs da13
  • Then I checked to see if the disk showed up with a label.
glabel list | grep da13
  • At which point I could found the label in the full list
glabel list
  • Then started the replacement of the failed disk that I previously took offline.
zpool replace tank 17805351018045823548 gptid/625287ff-7c6b-11e8-a699-002590fde644

  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jun 30 09:43:44 2018
	1.58T scanned at 647M/s, 840G issued at 335M/s, 1.58T total
	72.7G resilvered, 51.81% done, 0 days 00:39:48 to go
config:

	NAME                                              STATE     READ WRITE CKSUM
	tank                                              DEGRADED     0     0     0
	  raidz2-0                                        DEGRADED     0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    replacing-5                                   OFFLINE      0     0     0
	      17805351018045823548                        OFFLINE      0     0     0  was /dev/gptid/db20f312-5d4d-11e8-aaf6-002590fde644
	      gptid/625287ff-7c6b-11e8-a699-002590fde644  ONLINE       0     0     0  (resilvering)
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0