Difference between revisions of "ZFS on Linux"

From Michael's Information Zone
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 294: Line 294:
  
 
==FreeNAS Specific==
 
==FreeNAS Specific==
 +
===Recover from KDB Panic===
 +
<ref>https://www.ixsystems.com/community/threads/kdb-panic-on-pool-import.79693/</ref> During a zfs receive operation, the server crashed and during boot would kernel panic. In order to get online again I had to do the following.
 +
*Shutdown the server
 +
*Remove the data disks.
 +
*Boot into FreeNAS.
 +
*Insert the data disks (minus any that might be dying. In this case it was a raidz2 and one drive was failing)
 +
*Decrypt the data disks using the command below (replace with your own values)
 +
<pre>geli attach -p -k /data/geli/d144b46a-3567-427b-85a3-7db93fe6e170.key /dev/gptid/71c7afd0-9fbf-11e7-a9c9-d05099c312a7</pre>
 +
*Import the pool. Idealy read only at first.
 +
<pre>zpool import -o readonly=on DR-ARCHIVE</pre>
 +
*If successful, export the pool and re-import. This time as read/write and with the proper mount.
 +
<pre>zpool import -R /mnt DR-ARCHIVE</pre>
 +
*Run a scrub.
 +
NOTE : I am still recovering my pool. I do not know if it will boot properly.
 +
 
===Replace drive using CLI===
 
===Replace drive using CLI===
 
Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk.
 
Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk.
Line 337: Line 352:
 
    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE      0    0    0
 
    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE      0    0    0
 
</pre>
 
</pre>
 +
===Clear drives that were not properly exported===
 +
<ref>https://www.ixsystems.com/community/threads/error-creating-pool-problem-with-gpart.70629/</ref>For when you export a pool, planned on using the disks, but did not mark the disks to be cleared.
 +
*For each disk run the following then reboot.
 +
<pre>
 +
dd if=/dev/zero of=/dev/da23 bs=1m count=1
 +
dd if=/dev/zero of=/dev/da23 bs=1m oseek=`diskinfo da23 | awk '{print int($3 / (1024*1024)) - 4;}'`
 +
</pre>
 +
 
==ZIL Disk performance==
 
==ZIL Disk performance==
 
===ADATA SU800 128GB===
 
===ADATA SU800 128GB===
Line 367: Line 390:
 
63          # Sectors according to firmware.
 
63          # Sectors according to firmware.
 
ATA ADATA SU800 # Disk descr.
 
ATA ADATA SU800 # Disk descr.
2J2720018797       # Disk ident.
+
---       # Disk ident.
 
Yes        # TRIM/UNMAP support
 
Yes        # TRIM/UNMAP support
 
0          # Rotation rate in RPM
 
0          # Rotation rate in RPM
Line 418: Line 441:
 
63          # Sectors according to firmware.
 
63          # Sectors according to firmware.
 
ATA 2.5" SATA SSD 3M # Disk descr.
 
ATA 2.5" SATA SSD 3M # Disk descr.
20170503AA8931853024 # Disk ident.
+
---    # Disk ident.
 
Yes        # TRIM/UNMAP support
 
Yes        # TRIM/UNMAP support
 
0          # Rotation rate in RPM
 
0          # Rotation rate in RPM
Line 447: Line 470:
 
Model Family:    Intel 730 and DC S35x0/3610/3700 Series SSDs
 
Model Family:    Intel 730 and DC S35x0/3610/3700 Series SSDs
 
Device Model:    INTEL SSDSC2BA200G3T
 
Device Model:    INTEL SSDSC2BA200G3T
Serial Number:    BTTV414000DP200GGN
+
Serial Number:    ---
 
LU WWN Device Id: 5 5cd2e4 04b605189
 
LU WWN Device Id: 5 5cd2e4 04b605189
 
Add. Product Id:  DELL(tm)
 
Add. Product Id:  DELL(tm)
Line 473: Line 496:
 
63          # Sectors according to firmware.
 
63          # Sectors according to firmware.
 
ATA INTEL SSDSC2BA20 # Disk descr.
 
ATA INTEL SSDSC2BA20 # Disk descr.
BTTV414000DP200GGN # Disk ident.
+
---    # Disk ident.
 
Yes        # TRIM/UNMAP support
 
Yes        # TRIM/UNMAP support
 
0          # Rotation rate in RPM
 
0          # Rotation rate in RPM
Line 494: Line 517:
 
4096 kbytes:  16462.6 usec/IO =    243.0 Mbytes/s
 
4096 kbytes:  16462.6 usec/IO =    243.0 Mbytes/s
 
8192 kbytes:  32684.5 usec/IO =    244.8 Mbytes/s
 
8192 kbytes:  32684.5 usec/IO =    244.8 Mbytes/s
 +
</pre>
 +
 +
===Samsung 860 EVO 500GB===
 +
 +
<pre>
 +
=== START OF INFORMATION SECTION ===
 +
Device Model:    Samsung SSD 860 EVO 500GB
 +
Serial Number:    ---
 +
LU WWN Device Id: 5 002538 e4034d87b
 +
Firmware Version: RVT01B6Q
 +
User Capacity:    500,107,862,016 bytes [500 GB]
 +
Sector Size:      512 bytes logical/physical
 +
Rotation Rate:    Solid State Device
 +
Form Factor:      2.5 inches
 +
Device is:        Not in smartctl database [for details use: -P showall]
 +
ATA Version is:  ACS-4 T13/BSR INCITS 529 revision 5
 +
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
 +
Local Time is:    Mon Oct 21 08:18:54 2019 PDT
 +
SMART support is: Available - device has SMART capability.
 +
SMART support is: Enabled
 +
 +
root@freenas[~]# diskinfo -wS /dev/da23
 +
/dev/da23
 +
512        # sectorsize
 +
500107862016 # mediasize in bytes (466G)
 +
976773168  # mediasize in sectors
 +
0          # stripesize
 +
0          # stripeoffset
 +
60801      # Cylinders according to firmware.
 +
255        # Heads according to firmware.
 +
63          # Sectors according to firmware.
 +
ATA Samsung SSD 860 # Disk descr.
 +
---    # Disk ident.
 +
Yes        # TRIM/UNMAP support
 +
0          # Rotation rate in RPM
 +
Not_Zoned  # Zone Mode
 +
 +
Synchronous random writes:
 +
0.5 kbytes:    715.0 usec/IO =      0.7 Mbytes/s
 +
  1 kbytes:    719.1 usec/IO =      1.4 Mbytes/s
 +
  2 kbytes:    722.0 usec/IO =      2.7 Mbytes/s
 +
  4 kbytes:    692.9 usec/IO =      5.6 Mbytes/s
 +
  8 kbytes:    720.3 usec/IO =    10.8 Mbytes/s
 +
  16 kbytes:    730.7 usec/IO =    21.4 Mbytes/s
 +
  32 kbytes:    865.7 usec/IO =    36.1 Mbytes/s
 +
  64 kbytes:    927.5 usec/IO =    67.4 Mbytes/s
 +
128 kbytes:  1286.1 usec/IO =    97.2 Mbytes/s
 +
256 kbytes:  1206.2 usec/IO =    207.3 Mbytes/s
 +
512 kbytes:  1808.5 usec/IO =    276.5 Mbytes/s
 +
1024 kbytes:  2955.0 usec/IO =    338.4 Mbytes/s
 +
2048 kbytes:  5197.0 usec/IO =    384.8 Mbytes/s
 +
4096 kbytes:  9625.1 usec/IO =    415.6 Mbytes/s
 +
8192 kbytes:  26001.2 usec/IO =    307.7 Mbytes/s
 +
 +
</pre>
 +
 +
===Seagate IronWolf 110 480GB===
 +
 +
<pre>
 +
=== START OF INFORMATION SECTION ===
 +
Device Model:    ZA480NM10001
 +
Serial Number:    ---
 +
LU WWN Device Id: 5 000c50 03ea0daeb
 +
Firmware Version: SF44011J
 +
User Capacity:    480,103,981,056 bytes [480 GB]
 +
Sector Sizes:    512 bytes logical, 4096 bytes physical
 +
Rotation Rate:    Solid State Device
 +
Device is:        Not in smartctl database [for details use: -P showall]
 +
ATA Version is:  ACS-4, ACS-2 T13/2015-D revision 3
 +
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
 +
Local Time is:    Mon Oct 21 08:07:46 2019 PDT
 +
SMART support is: Available - device has SMART capability.
 +
SMART support is: Enabled
 +
 +
root@freenas[~]# diskinfo -wS /dev/da20
 +
/dev/da20
 +
512        # sectorsize
 +
480103981056 # mediasize in bytes (447G)
 +
937703088  # mediasize in sectors
 +
4096        # stripesize
 +
0          # stripeoffset
 +
58369      # Cylinders according to firmware.
 +
255        # Heads according to firmware.
 +
63          # Sectors according to firmware.
 +
ATA ZA480NM10001 # Disk descr.
 +
---    # Disk ident.
 +
Yes        # TRIM/UNMAP support
 +
0          # Rotation rate in RPM
 +
Not_Zoned  # Zone Mode
 +
 +
Synchronous random writes:
 +
0.5 kbytes:  3465.6 usec/IO =      0.1 Mbytes/s
 +
  1 kbytes:  3428.0 usec/IO =      0.3 Mbytes/s
 +
  2 kbytes:  3465.8 usec/IO =      0.6 Mbytes/s
 +
  4 kbytes:  3348.7 usec/IO =      1.2 Mbytes/s
 +
  8 kbytes:  3372.6 usec/IO =      2.3 Mbytes/s
 +
  16 kbytes:  3418.3 usec/IO =      4.6 Mbytes/s
 +
  32 kbytes:  3589.6 usec/IO =      8.7 Mbytes/s
 +
  64 kbytes:  3494.9 usec/IO =    17.9 Mbytes/s
 +
128 kbytes:  3630.0 usec/IO =    34.4 Mbytes/s
 +
256 kbytes:  3916.0 usec/IO =    63.8 Mbytes/s
 +
512 kbytes:  4478.3 usec/IO =    111.6 Mbytes/s
 +
1024 kbytes:  5559.3 usec/IO =    179.9 Mbytes/s
 +
2048 kbytes:  7746.3 usec/IO =    258.2 Mbytes/s
 +
4096 kbytes:  12259.7 usec/IO =    326.3 Mbytes/s
 +
8192 kbytes:  20970.7 usec/IO =    381.5 Mbytes/s
 
</pre>
 
</pre>
 
===Benchmarks===
 
===Benchmarks===
Line 504: Line 633:
 
*NFS
 
*NFS
 
*3 x raidz-2 vdevs with 6 1TB 7200rpm HDDs each.
 
*3 x raidz-2 vdevs with 6 1TB 7200rpm HDDs each.
*362GB RAM on FreeNAS. The entire VM disk should be cached, making this a write only test.
+
*384GB RAM on FreeNAS. The entire VM disk should be cached, making this a write only test.
 
<br>
 
<br>
 
Results
 
Results
Line 510: Line 639:
 
*Intel 730 DC 186GB : 66 seconds
 
*Intel 730 DC 186GB : 66 seconds
 
*ADATA SU800 119GB : 83 seconds
 
*ADATA SU800 119GB : 83 seconds
 +
*Samsung 860 EVO 500GB : 95 seconds
 
*Innodisk 3MG2-P 116GB : 112 seconds
 
*Innodisk 3MG2-P 116GB : 112 seconds
*No ZIL : 330 seconds
+
*Seagate IronWolf 110 480GB : 163 seconds
 +
*No ZIL SSD : 330 seconds
 
<br>
 
<br>
 
<br>
 
<br>

Latest revision as of 12:12, 21 October 2019

READ FIRST

Some considerations when working with ZFS

  • ZFS uses vdevs and not physical disks.
  • Be careful about how you add new disks to the array. No random adding and removing of disks (exception being when upgrading disks or a disk fails)
  • ZFS is very powerful, be mindful of what you are going to do and plan it out!
  • After a vdev is created, it can never be removed and you can not add into it.

Example:

NAME        STATE     READ WRITE CKSUM
	pool4tb     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0

radz1-0 is a vdev. To add more disks (other than hotspares) you must create a second vdev. In this case we are running two mirrored drives so it would be best to add a second pair of mirrored drives.

NAME        STATE     READ WRITE CKSUM
	pool4tb     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	  raidz1-1  ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	    sdf     ONLINE       0     0     0

Now data will be striped across both vdevs.

ZFS on Linux Installation

CentOS 7

It has been reported that when installing zfs and it's dependencies at the same time, the kernel modules will not get created. Below are the current steps I found to work when installing ZFS.

yum -y install epel-release

Make sure the system is completely up to date.

yum -y update
reboot -h

After reboot

yum -y install kernel-devel
yum -y localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum -y install spl

If everything was done right, the following command will take a while (depending on hardware)

yum -y install zfs-dkms
yum -y install zfs
/sbin/modprobe zfs

Fedora 28

[1]The instructions from the zfsonlinux.org site are correct, except for enabling the repo before installing. Even issuing the "dnf --set-enable zfs.repo" would result in failure. Had to edit the repo file directly (/etc/yum.repos.d/zfs.repo) to enable. Not a big deal, but something good to know.

Create ZFS Pool

At this point you can create your pool. Most of the time we will be interested in a ZRAID configuration. Depending on how much parity your interested user raidz, raidz1, raidz2, or raidz3.

zpool create <name of pool> raidz <disk1> <disk2> <etc>

NOTE: By default this will create a mount point of "/<name of pool>"

To add a spare drive

zpool add <name of pool> spare <disk>

Make sure to enable automatic rebuild when a drive fails, especially when using hot spares.

zpool autoreplace=on <name of pool>



I ran into the following that would help with managing the disks. Creating a label for each disk would have saved me time in the past[2]

# glabel label rex1 ada0
# glabel label rex2 ada1
# glabel label rex3 ada2
# zpool create rex raidz1 label/rex1 label/rex2 label/rex3

Create ZFS Volumes

zfs create <name of pool>/<Volume Name>
zfs set mountpoint=<mount point>

Example:

zfs create pool4tb/archive
mkdir /archive
zfs set mountpoint=/archive pool4tb/archive

Additional Options

To enable compression

zfs set compression=lz4 <name of pool>

To increase the number of copies of a file on a dataset

zfs set copies=<1,2,3>

To have the pool auto-expand

zpool set autoexpand=on <name of pool>
  • Encryption

http://www.makethenmakeinstall.com/2014/10/zfs-on-linux-with-luks-encrypted-disks/

EXAMPLE

1x2TB HDD sdb
4x1TB HDDs sdc sdd sde sdf

Using the above drives it is possible to create a variety of deployments. In this example we will create a RAID5 like configuration that spans across three 2TB devices.


We start by creating a pools and adding the drives.
[root@nas ~]# zpool create -f set1 raidz /dev/sdc /dev/sdd
[root@nas ~]# zpool create -f set2 raidz /dev/sde /dev/sdf
[root@nas ~]# zpool status
  pool: set1
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        set1        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0

errors: No known data errors

  pool: set2
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        set2        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors

[root@nas ~]# zfs create -V 1.50T set1/vdev1
[root@nas ~]# zfs create -V 1.50T set2/vdev1
[root@nas ~]# zfs list
NAME         USED  AVAIL  REFER  MOUNTPOINT
set1        1.55T   214G  57.5K  /set1
set1/vdev1  1.55T  1.76T    36K  -
set2        1.55T   214G  57.5K  /set2
set2/vdev2  1.55T  1.76T    36K  -

[root@nas ~]# ls /dev/
<condensed output>
zd0
zd16

[root@nas ~]# zpool create -f data raidz1 /dev/sdb /dev/zd0 /dev/zd16
[root@nas ~]# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data  4.47T   896K  4.47T         -     0%     0%  1.00x  ONLINE  -
set1  1.81T   742K  1.81T         -     0%     0%  1.00x  ONLINE  -
set2  1.81T   429K  1.81T         -     0%     0%  1.00x  ONLINE  -

[root@nas ~]# df -lh
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        33G  1.6G   32G   5% /
devtmpfs        3.8G     0  3.8G   0% /dev
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           3.8G  8.5M  3.8G   1% /run
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup
/dev/sda1       497M  200M  298M  41% /boot
tmpfs           775M     0  775M   0% /run/user/0
set1            214G  128K  214G   1% /set1
set2            214G  128K  214G   1% /set2
data            2.9T  256K  2.9T   1% /data

As you can see there is a LOT of wasted space using this method. Where we should have ~4TB of usable space we end with ~3TB. This was only an example, the better option is to create multiple independent data sets.

ZFS Send

Example of using ZFS send to replicate snapshots from a local pool to a local external drive.

nohup zfs send -R tank/datastore@auto-20180629.0000-2w | zfs recv -F backuppool/backup &

Incremental [3]

zfs send -R -i tank/datastore@auto-20180630.0000-2w tank/datastore@auto-20180701.0000-2w | zfs recv -F backuppool/backup

ssh[4]

nohup zfs send tank/datastore@auto-20180629.0000-2w | ssh root@somehost 'zfs receive backuppool/datastore@auto-20180629.0000-2w'

Troubleshooting

Auto import pool at boot

[5]There is a cache file that is used for mounting ZFS at boot. Make sure to run the following if ZFS is not importing on boot.

[root@nas ~]# systemctl enable zfs-import-cache.service

Kernel Module Failure After Upgrade

I ran the standard yum upgrade process on my home CentOS 7 server. After a reboot ZFS failed stating the module was not loaded and I should load it. However, modprobe would fail.

[root@nas ~]# modprobe zfs
modprobe: ERROR: could not insert 'zfs': Invalid argument

Checking dmesg

[root@nas ~]#  grep zfs /var/log/dmesg*
/var/log/dmesg.old:[    3.445947] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    3.445950] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.103167] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.103172] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.154686] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.154691] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.273800] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.273804] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[    5.377193] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[    5.377200] zfs: Unknown symbol vn_getattr (err -22)
/var/log/dmesg.old:[   92.649735] zfs: disagrees about version of symbol vn_getattr
/var/log/dmesg.old:[   92.649739] zfs: Unknown symbol vn_getattr (err -22)

I found a post about this[6], and it mentioned to check the dkms status. Below is what I found.

[root@nas ~]# dkms status
spl, 0.7.12, 3.10.0-862.14.4.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
zfs, 0.7.12, 3.10.0-862.14.4.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
[root@nas ~]# rpm -qa | grep kernel
kernel-3.10.0-862.11.6.el7.x86_64
kernel-tools-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-693.5.2.el7.x86_64
kernel-tools-libs-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-862.9.1.el7.x86_64
kernel-headers-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-862.6.3.el7.x86_64

Set Hot Spare as replacement device

I had an issue where I created a raidz2 pool without spares (which is fine for this deployment). A drive failed, and I installed a replacement as a spare using the FreeNAS gui (this one was not ZFSoL). I was then stuck with a perpetually degraded pool.

  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018
config:

	NAME                                              STATE     READ WRITE CKSUM
	tank                                              DEGRADED     0     0     0
	  raidz2-0                                        DEGRADED     0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/625287ff-7c6b-11e8-a699-002590fde644    ONLINE       0     0     0
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    spare-9                                       DEGRADED     0     0     0
	      17637264324123775223                        OFFLINE      0     0     0  was /dev/gptid/e55c7104-5d4d-11e8-aaf6-002590fde644
	      gptid/051c5d74-612e-11e8-8357-002590fde644  ONLINE       0     0     0
	    gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	logs
	  gptid/bdafc060-6ccc-11e8-8357-002590fde644      ONLINE       0     0     0
	spares
	  227308045836062793                              INUSE     was /dev/gptid/051c5d74-612e-11e8-8357-002590fde644

errors: No known data errors

But if I would RTFM[7] I would know to detach the failed drive that I previously made offline.

root@freenas:~ # zpool detach tank 17637264324123775223
root@freenas:~ # zpool status

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/625287ff-7c6b-11e8-a699-002590fde644  ONLINE       0     0     0
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	    gptid/051c5d74-612e-11e8-8357-002590fde644  ONLINE       0     0     0
	    gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644  ONLINE       0     0     0
	logs
	  gptid/bdafc060-6ccc-11e8-8357-002590fde644    ONLINE       0     0     0

errors: No known data errors

ZFS Not Mounting After Reboot

For some reason my system stopped mounting my ZFS volumes at boot. For a year I would manually mount as needed (a reboot was rare). But now I found the issue[8]

systemctl enable zfs-import.target

FreeNAS Specific

Recover from KDB Panic

[9] During a zfs receive operation, the server crashed and during boot would kernel panic. In order to get online again I had to do the following.

  • Shutdown the server
  • Remove the data disks.
  • Boot into FreeNAS.
  • Insert the data disks (minus any that might be dying. In this case it was a raidz2 and one drive was failing)
  • Decrypt the data disks using the command below (replace with your own values)
geli attach -p -k /data/geli/d144b46a-3567-427b-85a3-7db93fe6e170.key /dev/gptid/71c7afd0-9fbf-11e7-a9c9-d05099c312a7
  • Import the pool. Idealy read only at first.
zpool import -o readonly=on DR-ARCHIVE
  • If successful, export the pool and re-import. This time as read/write and with the proper mount.
zpool import -R /mnt DR-ARCHIVE
  • Run a scrub.

NOTE : I am still recovering my pool. I do not know if it will boot properly.

Replace drive using CLI

Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk.

The following is what I did to obtain a gptid for the drive.[10]

  • First I obtained the drive information. Had a local tech provide me the SN.
  • Ran a script I wrote to pull SN from drives listed in /dev to obtain the correct device (/dev/da13)
  • At this point I created the gpt partion on the disk using the steps from the reference above.
gpart create -s gpt da13
gpart add -t freebsd-ufs da13
  • Then I checked to see if the disk showed up with a label.
glabel list | grep da13
  • At which point I could found the label in the full list
glabel list
  • Then started the replacement of the failed disk that I previously took offline.
zpool replace tank 17805351018045823548 gptid/625287ff-7c6b-11e8-a699-002590fde644

  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jun 30 09:43:44 2018
	1.58T scanned at 647M/s, 840G issued at 335M/s, 1.58T total
	72.7G resilvered, 51.81% done, 0 days 00:39:48 to go
config:

	NAME                                              STATE     READ WRITE CKSUM
	tank                                              DEGRADED     0     0     0
	  raidz2-0                                        DEGRADED     0     0     0
	    gptid/ca363e73-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/cca8828b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d1b86990-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d51049fe-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/d804819b-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    replacing-5                                   OFFLINE      0     0     0
	      17805351018045823548                        OFFLINE      0     0     0  was /dev/gptid/db20f312-5d4d-11e8-aaf6-002590fde644
	      gptid/625287ff-7c6b-11e8-a699-002590fde644  ONLINE       0     0     0  (resilvering)
	    gptid/dda24b58-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0
	    gptid/e39e8936-5d4d-11e8-aaf6-002590fde644    ONLINE       0     0     0

Clear drives that were not properly exported

[11]For when you export a pool, planned on using the disks, but did not mark the disks to be cleared.

  • For each disk run the following then reboot.
dd if=/dev/zero of=/dev/da23 bs=1m count=1
dd if=/dev/zero of=/dev/da23 bs=1m oseek=`diskinfo da23 | awk '{print int($3 / (1024*1024)) - 4;}'`

ZIL Disk performance

ADATA SU800 128GB

=== START OF INFORMATION SECTION ===
Device Model:     ADATA SU800
Serial Number:    ---
LU WWN Device Id: 5 707c18 300038465
Firmware Version: Q0922FS
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Oct 17 07:49:26 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@freenas[~]# diskinfo -wS /dev/da21
/dev/da21
	512         	# sectorsize
	128035676160	# mediasize in bytes (119G)
	250069680   	# mediasize in sectors
	0           	# stripesize
	0           	# stripeoffset
	15566       	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	ATA ADATA SU800	# Disk descr.
	---        	# Disk ident.
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

Synchronous random writes:
	 0.5 kbytes:    781.3 usec/IO =      0.6 Mbytes/s
	   1 kbytes:    784.3 usec/IO =      1.2 Mbytes/s
	   2 kbytes:    800.7 usec/IO =      2.4 Mbytes/s
	   4 kbytes:    805.7 usec/IO =      4.8 Mbytes/s
	   8 kbytes:    795.7 usec/IO =      9.8 Mbytes/s
	  16 kbytes:    806.0 usec/IO =     19.4 Mbytes/s
	  32 kbytes:    787.7 usec/IO =     39.7 Mbytes/s
	  64 kbytes:    944.2 usec/IO =     66.2 Mbytes/s
	 128 kbytes:   1353.6 usec/IO =     92.3 Mbytes/s
	 256 kbytes:   2001.1 usec/IO =    124.9 Mbytes/s
	 512 kbytes:   3185.4 usec/IO =    157.0 Mbytes/s
	1024 kbytes:   5407.7 usec/IO =    184.9 Mbytes/s
	2048 kbytes:   7622.4 usec/IO =    262.4 Mbytes/s
	4096 kbytes:  12125.0 usec/IO =    329.9 Mbytes/s
	8192 kbytes:  21478.9 usec/IO =    372.5 Mbytes/s

Innodisk 3MG2-P (FreeNAS L2ARC)

This is the official FreeNAS L2ARC SSD sold on Amazon by ixSystems. Please note that this was not intended to be a ZIL disk, but I had it on hand so why not test it?

=== START OF INFORMATION SECTION ===
Model Family:     Innodisk 3IE2/3ME2/3MG2/3SE2 SSDs
Device Model:     2.5" SATA SSD 3MG2-P
Serial Number:    ---
Firmware Version: M150821
User Capacity:    124,034,899,968 bytes [124 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Oct 17 08:09:09 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@freenas[~]# diskinfo -wS /dev/da21
/dev/da21
	512         	# sectorsize
	124034899968	# mediasize in bytes (116G)
	242255664   	# mediasize in sectors
	0           	# stripesize
	0           	# stripeoffset
	15079       	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	ATA 2.5" SATA SSD 3M	# Disk descr.
	---     	# Disk ident.
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

Synchronous random writes:
	 0.5 kbytes:   1449.3 usec/IO =      0.3 Mbytes/s
	   1 kbytes:   1458.5 usec/IO =      0.7 Mbytes/s
	   2 kbytes:   1477.6 usec/IO =      1.3 Mbytes/s
	   4 kbytes:   1492.7 usec/IO =      2.6 Mbytes/s
	   8 kbytes:   1471.4 usec/IO =      5.3 Mbytes/s
	  16 kbytes:   1503.7 usec/IO =     10.4 Mbytes/s
	  32 kbytes:   1554.2 usec/IO =     20.1 Mbytes/s
	  64 kbytes:   1711.3 usec/IO =     36.5 Mbytes/s
	 128 kbytes:   2101.6 usec/IO =     59.5 Mbytes/s
	 256 kbytes:   2535.3 usec/IO =     98.6 Mbytes/s
	 512 kbytes:   3598.5 usec/IO =    138.9 Mbytes/s
	1024 kbytes:   5856.2 usec/IO =    170.8 Mbytes/s
	2048 kbytes:   8262.6 usec/IO =    242.1 Mbytes/s
	4096 kbytes:  13505.4 usec/IO =    296.2 Mbytes/s
	8192 kbytes:  23919.1 usec/IO =    334.5 Mbytes/s

Intel 730 DC

The key here is that though this SSD has a slower interface, it doesn't matter because the throughput on small file writes is so high.

=== START OF INFORMATION SECTION ===
Model Family:     Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model:     INTEL SSDSC2BA200G3T
Serial Number:    ---
LU WWN Device Id: 5 5cd2e4 04b605189
Add. Product Id:  DELL(tm)
Firmware Version: 5DV1DL05
User Capacity:    200,049,647,616 bytes [200 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Oct 17 09:30:18 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@freenas[~]# diskinfo -wS /dev/da21
/dev/da21
	512         	# sectorsize
	200049647616	# mediasize in bytes (186G)
	390721968   	# mediasize in sectors
	0           	# stripesize
	0           	# stripeoffset
	24321       	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	ATA INTEL SSDSC2BA20	# Disk descr.
	---     	# Disk ident.
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

Synchronous random writes:
	 0.5 kbytes:    328.0 usec/IO =      1.5 Mbytes/s
	   1 kbytes:    310.2 usec/IO =      3.1 Mbytes/s
	   2 kbytes:    269.4 usec/IO =      7.2 Mbytes/s
	   4 kbytes:    182.9 usec/IO =     21.4 Mbytes/s
	   8 kbytes:    205.6 usec/IO =     38.0 Mbytes/s
	  16 kbytes:    254.2 usec/IO =     61.5 Mbytes/s
	  32 kbytes:    318.0 usec/IO =     98.3 Mbytes/s
	  64 kbytes:    444.8 usec/IO =    140.5 Mbytes/s
	 128 kbytes:    712.1 usec/IO =    175.5 Mbytes/s
	 256 kbytes:   1271.0 usec/IO =    196.7 Mbytes/s
	 512 kbytes:   2293.7 usec/IO =    218.0 Mbytes/s
	1024 kbytes:   4376.1 usec/IO =    228.5 Mbytes/s
	2048 kbytes:   8388.3 usec/IO =    238.4 Mbytes/s
	4096 kbytes:  16462.6 usec/IO =    243.0 Mbytes/s
	8192 kbytes:  32684.5 usec/IO =    244.8 Mbytes/s

Samsung 860 EVO 500GB

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 860 EVO 500GB
Serial Number:    ---
LU WWN Device Id: 5 002538 e4034d87b
Firmware Version: RVT01B6Q
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 21 08:18:54 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@freenas[~]# diskinfo -wS /dev/da23
/dev/da23
	512         	# sectorsize
	500107862016	# mediasize in bytes (466G)
	976773168   	# mediasize in sectors
	0           	# stripesize
	0           	# stripeoffset
	60801       	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	ATA Samsung SSD 860	# Disk descr.
	---     	# Disk ident.
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

Synchronous random writes:
	 0.5 kbytes:    715.0 usec/IO =      0.7 Mbytes/s
	   1 kbytes:    719.1 usec/IO =      1.4 Mbytes/s
	   2 kbytes:    722.0 usec/IO =      2.7 Mbytes/s
	   4 kbytes:    692.9 usec/IO =      5.6 Mbytes/s
	   8 kbytes:    720.3 usec/IO =     10.8 Mbytes/s
	  16 kbytes:    730.7 usec/IO =     21.4 Mbytes/s
	  32 kbytes:    865.7 usec/IO =     36.1 Mbytes/s
	  64 kbytes:    927.5 usec/IO =     67.4 Mbytes/s
	 128 kbytes:   1286.1 usec/IO =     97.2 Mbytes/s
	 256 kbytes:   1206.2 usec/IO =    207.3 Mbytes/s
	 512 kbytes:   1808.5 usec/IO =    276.5 Mbytes/s
	1024 kbytes:   2955.0 usec/IO =    338.4 Mbytes/s
	2048 kbytes:   5197.0 usec/IO =    384.8 Mbytes/s
	4096 kbytes:   9625.1 usec/IO =    415.6 Mbytes/s
	8192 kbytes:  26001.2 usec/IO =    307.7 Mbytes/s

Seagate IronWolf 110 480GB

=== START OF INFORMATION SECTION ===
Device Model:     ZA480NM10001
Serial Number:    ---
LU WWN Device Id: 5 000c50 03ea0daeb
Firmware Version: SF44011J
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 21 08:07:46 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@freenas[~]# diskinfo -wS /dev/da20
/dev/da20
	512         	# sectorsize
	480103981056	# mediasize in bytes (447G)
	937703088   	# mediasize in sectors
	4096        	# stripesize
	0           	# stripeoffset
	58369       	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	ATA ZA480NM10001	# Disk descr.
	---     	# Disk ident.
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM
	Not_Zoned   	# Zone Mode

Synchronous random writes:
	 0.5 kbytes:   3465.6 usec/IO =      0.1 Mbytes/s
	   1 kbytes:   3428.0 usec/IO =      0.3 Mbytes/s
	   2 kbytes:   3465.8 usec/IO =      0.6 Mbytes/s
	   4 kbytes:   3348.7 usec/IO =      1.2 Mbytes/s
	   8 kbytes:   3372.6 usec/IO =      2.3 Mbytes/s
	  16 kbytes:   3418.3 usec/IO =      4.6 Mbytes/s
	  32 kbytes:   3589.6 usec/IO =      8.7 Mbytes/s
	  64 kbytes:   3494.9 usec/IO =     17.9 Mbytes/s
	 128 kbytes:   3630.0 usec/IO =     34.4 Mbytes/s
	 256 kbytes:   3916.0 usec/IO =     63.8 Mbytes/s
	 512 kbytes:   4478.3 usec/IO =    111.6 Mbytes/s
	1024 kbytes:   5559.3 usec/IO =    179.9 Mbytes/s
	2048 kbytes:   7746.3 usec/IO =    258.2 Mbytes/s
	4096 kbytes:  12259.7 usec/IO =    326.3 Mbytes/s
	8192 kbytes:  20970.7 usec/IO =    381.5 Mbytes/s

Benchmarks

This is a simple test. I have a Windows 10 VM, starting from the login screen I reboot the VM and time it using a stop watch.
NOTE : Running with low latency SSD for ZIL does not improve performance enough to make this viable for many writes with several VMs. Trying to clone 13 VMs brought the system to a craw. A reboot took 15 minutes while the clone process was running.

Environment

  • 10Gb between esxi and FreeNAS
  • NFS
  • 3 x raidz-2 vdevs with 6 1TB 7200rpm HDDs each.
  • 384GB RAM on FreeNAS. The entire VM disk should be cached, making this a write only test.


Results

  • Sync Disabled : 43 seconds
  • Intel 730 DC 186GB : 66 seconds
  • ADATA SU800 119GB : 83 seconds
  • Samsung 860 EVO 500GB : 95 seconds
  • Innodisk 3MG2-P 116GB : 112 seconds
  • Seagate IronWolf 110 480GB : 163 seconds
  • No ZIL SSD : 330 seconds