Difference between revisions of "ZFS on Linux"

From Michael's Information Zone
Jump to navigation Jump to search
Line 336: Line 336:
 
    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE      0    0    0
 
    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE      0    0    0
 
    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE      0    0    0
 
    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE      0    0    0
 +
</pre>
 +
==SLOG Disk performance==
 +
===ADATA SU800 128GB===
 +
<pre>
 +
=== START OF INFORMATION SECTION ===
 +
Device Model:    ADATA SU800
 +
Serial Number:    ---
 +
LU WWN Device Id: 5 707c18 300038465
 +
Firmware Version: Q0922FS
 +
User Capacity:    128,035,676,160 bytes [128 GB]
 +
Sector Size:      512 bytes logical/physical
 +
Rotation Rate:    Solid State Device
 +
Form Factor:      2.5 inches
 +
Device is:        Not in smartctl database [for details use: -P showall]
 +
ATA Version is:  ACS-3 (minor revision not indicated)
 +
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
 +
Local Time is:    Thu Oct 17 07:49:26 2019 PDT
 +
SMART support is: Available - device has SMART capability.
 +
SMART support is: Enabled
 +
 +
root@freenas[~]# diskinfo -wS /dev/da21
 +
/dev/da21
 +
512        # sectorsize
 +
128035676160 # mediasize in bytes (119G)
 +
250069680  # mediasize in sectors
 +
0          # stripesize
 +
0          # stripeoffset
 +
15566      # Cylinders according to firmware.
 +
255        # Heads according to firmware.
 +
63          # Sectors according to firmware.
 +
ATA ADATA SU800 # Disk descr.
 +
2J2720018797        # Disk ident.
 +
Yes        # TRIM/UNMAP support
 +
0          # Rotation rate in RPM
 +
Not_Zoned  # Zone Mode
 +
 +
Synchronous random writes:
 +
0.5 kbytes:    781.3 usec/IO =      0.6 Mbytes/s
 +
  1 kbytes:    784.3 usec/IO =      1.2 Mbytes/s
 +
  2 kbytes:    800.7 usec/IO =      2.4 Mbytes/s
 +
  4 kbytes:    805.7 usec/IO =      4.8 Mbytes/s
 +
  8 kbytes:    795.7 usec/IO =      9.8 Mbytes/s
 +
  16 kbytes:    806.0 usec/IO =    19.4 Mbytes/s
 +
  32 kbytes:    787.7 usec/IO =    39.7 Mbytes/s
 +
  64 kbytes:    944.2 usec/IO =    66.2 Mbytes/s
 +
128 kbytes:  1353.6 usec/IO =    92.3 Mbytes/s
 +
256 kbytes:  2001.1 usec/IO =    124.9 Mbytes/s
 +
512 kbytes:  3185.4 usec/IO =    157.0 Mbytes/s
 +
1024 kbytes:  5407.7 usec/IO =    184.9 Mbytes/s
 +
2048 kbytes:  7622.4 usec/IO =    262.4 Mbytes/s
 +
4096 kbytes:  12125.0 usec/IO =    329.9 Mbytes/s
 +
8192 kbytes:  21478.9 usec/IO =    372.5 Mbytes/s
 
</pre>
 
</pre>

Revision as of 09:52, 17 October 2019

READ FIRST

Some considerations when working with ZFS

  • ZFS uses vdevs and not physical disks.
  • Be careful about how you add new disks to the array. No random adding and removing of disks (exception being when upgrading disks or a disk fails)
  • ZFS is very powerful, be mindful of what you are going to do and plan it out!
  • After a vdev is created, it can never be removed and you can not add into it.

Example:

NAME        STATE     READ WRITE CKSUM
	pool4tb     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0

radz1-0 is a vdev. To add more disks (other than hotspares) you must create a second vdev. In this case we are running two mirrored drives so it would be best to add a second pair of mirrored drives.

NAME        STATE     READ WRITE CKSUM
	pool4tb     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	  raidz1-1  ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	    sdf     ONLINE       0     0     0

Now data will be striped across both vdevs.

ZFS on Linux Installation

CentOS 7

It has been reported that when installing zfs and it's dependencies at the same time, the kernel modules will not get created. Below are the current steps I found to work when installing ZFS.

yum -y install epel-release

Make sure the system is completely up to date.

yum -y update
reboot -h

After reboot

yum -y install kernel-devel
yum -y localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum -y install spl

If everything was done right, the following command will take a while (depending on hardware)

yum -y install zfs-dkms
yum -y install zfs
/sbin/modprobe zfs

Fedora 28

[1]The instructions from the zfsonlinux.org site are correct, except for enabling the repo before installing. Even issuing the "dnf --set-enable zfs.repo" would result in failure. Had to edit the repo file directly (/etc/yum.repos.d/zfs.repo) to enable. Not a big deal, but something good to know.

Create ZFS Pool

At this point you can create your pool. Most of the time we will be interested in a ZRAID configuration. Depending on how much parity your interested user raidz, raidz1, raidz2, or raidz3.

zpool create <name of pool> raidz <disk1> <disk2> <etc>

NOTE: By default this will create a mount point of "/<name of pool>"

To add a spare drive

zpool add <name of pool> spare <disk>

Make sure to enable automatic rebuild when a drive fails, especially when using hot spares.

zpool autoreplace=on <name of pool>



I ran into the following that would help with managing the disks. Creating a label for each disk would have saved me time in the past[2]

# glabel label rex1 ada0
# glabel label rex2 ada1
# glabel label rex3 ada2
# zpool create rex raidz1 label/rex1 label/rex2 label/rex3

Create ZFS Volumes

zfs create <name of pool>/<Volume Name>
zfs set mountpoint=<mount point>

Example:

zfs create pool4tb/archive
mkdir /archive
zfs set mountpoint=/archive pool4tb/archive

Additional Options

To enable compression

zfs set compression=lz4 <name of pool>

To increase the number of copies of a file on a dataset

zfs set copies=<1,2,3>

To have the pool auto-expand

zpool set autoexpand=on <name of pool>
  • Encryption

http://www.makethenmakeinstall.com/2014/10/zfs-on-linux-with-luks-encrypted-disks/

EXAMPLE

1x2TB HDD sdb
4x1TB HDDs sdc sdd sde sdf

Using the above drives it is possible to create a variety of deployments. In this example we will create a RAID5 like configuration that spans across three 2TB devices.


We start by creating a pools and adding the drives.
[root@nas ~]# zpool create -f set1 raidz /dev/sdc /dev/sdd
[root@nas ~]# zpool create -f set2 raidz /dev/sde /dev/sdf
[root@nas ~]# zpool status
  pool: set1
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        set1        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0

errors: No known data errors

  pool: set2
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        set2        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors

[root@nas ~]# zfs create -V 1.50T set1/vdev1
[root@nas ~]# zfs create -V 1.50T set2/vdev1
[root@nas ~]# zfs list
NAME         USED  AVAIL  REFER  MOUNTPOINT
set1        1.55T   214G  57.5K  /set1
set1/vdev1  1.55T  1.76T    36K  -
set2        1.55T   214G  57.5K  /set2
set2/vdev2  1.55T  1.76T    36K  -

[root@nas ~]# ls /dev/
<condensed output>
zd0
zd16

[root@nas ~]# zpool create -f data raidz1 /dev/sdb /dev/zd0 /dev/zd16
[root@nas ~]# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data  4.47T   896K  4.47T         -     0%     0%  1.00x  ONLINE  -
set1  1.81T   742K  1.81T         -     0%     0%  1.00x  ONLINE  -
set2  1.81T   429K  1.81T         -     0%     0%  1.00x  ONLINE  -

[root@nas ~]# df -lh
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        33G  1.6G   32G   5% /
devtmpfs        3.8G     0  3.8G   0% /dev
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           3.8G  8.5M  3.8G   1% /run
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup
/dev/sda1       497M  200M  298M  41% /boot
tmpfs           775M     0  775M   0% /run/user/0
set1            214G  128K  214G   1% /set1
set2            214G  128K  214G   1% /set2
data            2.9T  256K  2.9T   1% /data

As you can see there is a LOT of wasted space using this method. Where we should have ~4TB of usable space we end with ~3TB. This was only an example, the better option is to create multiple independent data sets.

ZFS Send

Example of using ZFS send to replicate snapshots from a local pool to a local external drive.

nohup zfs send -R tank/datastore@auto-20180629.0000-2w | zfs recv -F backuppool/backup &

Incremental [3]

zfs send -R -i tank/datastore@auto-20180630.0000-2w tank/datastore@auto-20180701.0000-2w | zfs recv -F backuppool/backup

ssh[4]

nohup zfs send tank/datastore@auto-20180629.0000-2w | ssh root@somehost 'zfs receive backuppool/datastore@auto-20180629.0000-2w'

Troubleshooting

Auto import pool at boot

[5]There is a cache file that is used for mounting ZFS at boot. Make sure to run the following if ZFS is not importing on boot.

[root@nas ~]# systemctl enable zfs-import-cache.service

Kernel Module Failure After Upgrade

I ran the standard yum upgrade process on my home CentOS 7 server. After a reboot ZFS failed stating the module was not loaded and I should load it. However, modprobe would fail.

[root@nas ~]# modprobe zfs
modprobe: ERROR: could not insert 'zfs': Invalid argument

Checking dmesg

[root@nas ~]#  grep zfs /var/log/dmesg*
/var/log/dmesg.old:[    3.445947] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    3.445950] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.103167] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.103172] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.154686] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.154691] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.273800] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.273804] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.377193] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.377200] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[   92.649735] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[   92.649739] zfs: Unknown symbol vn_getattr (err -22)

I found a post about this[6], and it mentioned to check the dkms status. Below is what I found.

[root@nas ~]# dkms status
spl, 0.7.12, 3.10.0-862.14.4.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
zfs, 0.7.12, 3.10.0-862.14.4.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
[root@nas ~]# rpm -qa | grep kernel
kernel-3.10.0-862.11.6.el7.x86_64
kernel-tools-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-693.5.2.el7.x86_64
kernel-tools-libs-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-862.9.1.el7.x86_64
kernel-headers-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-862.6.3.el7.x86_64

Set Hot Spare as replacement device

I had an issue where I created a raidz2 pool without spares (which is fine for this deployment). A drive failed, and I installed a replacement as a spare using the FreeNAS gui (this one was not ZFSoL). I was then stuck with a perpetually degraded pool.

  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018
config:

	NAME                                              STATE     READ WRITE CKSUM
	tank                                              DEGRADED     0     0     0
	  raidz2-0                                        DEGRADED     0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/625287ff-7c6b-11e8-a699-002590fde644    ONLINE       0     0     0
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    spare-9                                       DEGRADED     0     0     0
	      17637264324123775223                        OFFLINE      0     0     0  was /dev/gptid/e55c7104-5d4d-11e8-aaf6-002590fde644
	      gptid/051c5d74-612e-11e8-8357-002590fde644  ONLINE       0     0     0
	    gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	logs
	  gptid/bdafc060-6ccc-11e8-8357-002590fde644      ONLINE       0     0     0
	spares
	  227308045836062793                              INUSE     was /dev/gptid/051c5d74-612e-11e8-8357-002590fde644

errors: No known data errors

But if I would RTFM[7] I would know to detach the failed drive that I previously made offline.

root@freenas:~ # zpool detach tank 17637264324123775223
root@freenas:~ # zpool status

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/625287ff-7c6b-11e8-a699-002590fde644  ONLINE       0     0     0
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/051c5d74-612e-11e8-8357-002590fde644  ONLINE       0     0     0
	    gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	logs
	  gptid/bdafc060-6ccc-11e8-8357-002590fde644    ONLINE       0     0     0

errors: No known data errors

ZFS Not Mounting After Reboot

For some reason my system stopped mounting my ZFS volumes at boot. For a year I would manually mount as needed (a reboot was rare). But now I found the issue[8]

systemctl enable zfs-import.target

FreeNAS Specific

Replace drive using CLI

Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk.

The following is what I did to obtain a gptid for the drive.[9]

  • First I obtained the drive information. Had a local tech provide me the SN.
  • Ran a script I wrote to pull SN from drives listed in /dev to obtain the correct device (/dev/da13)
  • At this point I created the gpt partion on the disk using the steps from the reference above.
gpart create -s gpt da13
gpart add -t freebsd-ufs da13
  • Then I checked to see if the disk showed up with a label.
glabel list | grep da13
  • At which point I could found the label in the full list
glabel list
  • Then started the replacement of the failed disk that I previously took offline.
zpool replace tank 17805351018045823548 gptid/625287ff-7c6b-11e8-a699-002590fde644

  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jun 30 09:43:44 2018
	1.58T scanned at 647M/s, 840G issued at 335M/s, 1.58T total
	72.7G resilvered, 51.81% done, 0 days 00:39:48 to go
config:

	NAME                                              STATE     READ WRITE CKSUM
	tank                                              DEGRADED     0     0     0
	  raidz2-0                                        DEGRADED     0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    replacing-5                                   OFFLINE      0     0     0
	      17805351018045823548                        OFFLINE      0     0     0  was /dev/gptid/db20f312-5d4d-11e8-aaf6-002590fde644
	      gptid/625287ff-7c6b-11e8-a699-002590fde644  ONLINE       0     0     0  (resilvering)
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0

SLOG Disk performance

ADATA SU800 128GB

=== START OF INFORMATION SECTION ===
Device Model:     ADATA SU800
Serial Number:    ---
LU WWN Device Id: 5 707c18 300038465
Firmware Version: Q0922FS
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Oct 17 07:49:26 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@freenas[~]# diskinfo -wS /dev/da21
/dev/da21
	512         	# sectorsize
	128035676160	# mediasize in bytes (119G)
	250069680   	# mediasize in sectors
	0           	# stripesize
	0           	# stripeoffset
	15566       	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	ATA ADATA SU800	# Disk descr.
	2J2720018797        	# Disk ident.
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

Synchronous random writes:
	 0.5 kbytes:    781.3 usec/IO =      0.6 Mbytes/s
	   1 kbytes:    784.3 usec/IO =      1.2 Mbytes/s
	   2 kbytes:    800.7 usec/IO =      2.4 Mbytes/s
	   4 kbytes:    805.7 usec/IO =      4.8 Mbytes/s
	   8 kbytes:    795.7 usec/IO =      9.8 Mbytes/s
	  16 kbytes:    806.0 usec/IO =     19.4 Mbytes/s
	  32 kbytes:    787.7 usec/IO =     39.7 Mbytes/s
	  64 kbytes:    944.2 usec/IO =     66.2 Mbytes/s
	 128 kbytes:   1353.6 usec/IO =     92.3 Mbytes/s
	 256 kbytes:   2001.1 usec/IO =    124.9 Mbytes/s
	 512 kbytes:   3185.4 usec/IO =    157.0 Mbytes/s
	1024 kbytes:   5407.7 usec/IO =    184.9 Mbytes/s
	2048 kbytes:   7622.4 usec/IO =    262.4 Mbytes/s
	4096 kbytes:  12125.0 usec/IO =    329.9 Mbytes/s
	8192 kbytes:  21478.9 usec/IO =    372.5 Mbytes/s