Difference between revisions of "ZFS on Linux"
Michael.mast (talk | contribs) |
Michael.mast (talk | contribs) |
||
(10 intermediate revisions by the same user not shown) | |||
Line 294: | Line 294: | ||
==FreeNAS Specific== | ==FreeNAS Specific== | ||
+ | ===Recover from KDB Panic=== | ||
+ | <ref>https://www.ixsystems.com/community/threads/kdb-panic-on-pool-import.79693/</ref> During a zfs receive operation, the server crashed and during boot would kernel panic. In order to get online again I had to do the following. | ||
+ | *Shutdown the server | ||
+ | *Remove the data disks. | ||
+ | *Boot into FreeNAS. | ||
+ | *Insert the data disks (minus any that might be dying. In this case it was a raidz2 and one drive was failing) | ||
+ | *Decrypt the data disks using the command below (replace with your own values) | ||
+ | <pre>geli attach -p -k /data/geli/d144b46a-3567-427b-85a3-7db93fe6e170.key /dev/gptid/71c7afd0-9fbf-11e7-a9c9-d05099c312a7</pre> | ||
+ | *Import the pool. Idealy read only at first. | ||
+ | <pre>zpool import -o readonly=on DR-ARCHIVE</pre> | ||
+ | *If successful, export the pool and re-import. This time as read/write and with the proper mount. | ||
+ | <pre>zpool import -R /mnt DR-ARCHIVE</pre> | ||
+ | *Run a scrub. | ||
+ | NOTE : I am still recovering my pool. I do not know if it will boot properly. | ||
+ | |||
===Replace drive using CLI=== | ===Replace drive using CLI=== | ||
Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk. | Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk. | ||
Line 337: | Line 352: | ||
gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 | gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 | ||
</pre> | </pre> | ||
+ | ===Clear drives that were not properly exported=== | ||
+ | <ref>https://www.ixsystems.com/community/threads/error-creating-pool-problem-with-gpart.70629/</ref>For when you export a pool, planned on using the disks, but did not mark the disks to be cleared. | ||
+ | *For each disk run the following then reboot. | ||
+ | <pre> | ||
+ | dd if=/dev/zero of=/dev/da23 bs=1m count=1 | ||
+ | dd if=/dev/zero of=/dev/da23 bs=1m oseek=`diskinfo da23 | awk '{print int($3 / (1024*1024)) - 4;}'` | ||
+ | </pre> | ||
+ | |||
==ZIL Disk performance== | ==ZIL Disk performance== | ||
===ADATA SU800 128GB=== | ===ADATA SU800 128GB=== | ||
Line 367: | Line 390: | ||
63 # Sectors according to firmware. | 63 # Sectors according to firmware. | ||
ATA ADATA SU800 # Disk descr. | ATA ADATA SU800 # Disk descr. | ||
− | + | --- # Disk ident. | |
Yes # TRIM/UNMAP support | Yes # TRIM/UNMAP support | ||
0 # Rotation rate in RPM | 0 # Rotation rate in RPM | ||
Line 418: | Line 441: | ||
63 # Sectors according to firmware. | 63 # Sectors according to firmware. | ||
ATA 2.5" SATA SSD 3M # Disk descr. | ATA 2.5" SATA SSD 3M # Disk descr. | ||
− | + | --- # Disk ident. | |
Yes # TRIM/UNMAP support | Yes # TRIM/UNMAP support | ||
0 # Rotation rate in RPM | 0 # Rotation rate in RPM | ||
Line 441: | Line 464: | ||
</pre> | </pre> | ||
− | ===Intel 730== | + | ===Intel 730 DC=== |
− | + | The key here is that though this SSD has a slower interface, it doesn't matter because the throughput on small file writes is so high. | |
<pre> | <pre> | ||
=== START OF INFORMATION SECTION === | === START OF INFORMATION SECTION === | ||
Line 448: | Line 471: | ||
Device Model: INTEL SSDSC2BA200G3T | Device Model: INTEL SSDSC2BA200G3T | ||
Serial Number: --- | Serial Number: --- | ||
− | LU WWN Device Id: 5 5cd2e4 | + | LU WWN Device Id: 5 5cd2e4 04b605189 |
Add. Product Id: DELL(tm) | Add. Product Id: DELL(tm) | ||
Firmware Version: 5DV1DL05 | Firmware Version: 5DV1DL05 | ||
Line 458: | Line 481: | ||
ATA Version is: ATA8-ACS T13/1699-D revision 4 | ATA Version is: ATA8-ACS T13/1699-D revision 4 | ||
SATA Version is: SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s) | SATA Version is: SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s) | ||
− | Local Time is: Thu Oct 17 | + | Local Time is: Thu Oct 17 09:30:18 2019 PDT |
SMART support is: Available - device has SMART capability. | SMART support is: Available - device has SMART capability. | ||
SMART support is: Enabled | SMART support is: Enabled | ||
Line 473: | Line 496: | ||
63 # Sectors according to firmware. | 63 # Sectors according to firmware. | ||
ATA INTEL SSDSC2BA20 # Disk descr. | ATA INTEL SSDSC2BA20 # Disk descr. | ||
− | + | --- # Disk ident. | |
+ | Yes # TRIM/UNMAP support | ||
+ | 0 # Rotation rate in RPM | ||
+ | Not_Zoned # Zone Mode | ||
+ | |||
+ | Synchronous random writes: | ||
+ | 0.5 kbytes: 328.0 usec/IO = 1.5 Mbytes/s | ||
+ | 1 kbytes: 310.2 usec/IO = 3.1 Mbytes/s | ||
+ | 2 kbytes: 269.4 usec/IO = 7.2 Mbytes/s | ||
+ | 4 kbytes: 182.9 usec/IO = 21.4 Mbytes/s | ||
+ | 8 kbytes: 205.6 usec/IO = 38.0 Mbytes/s | ||
+ | 16 kbytes: 254.2 usec/IO = 61.5 Mbytes/s | ||
+ | 32 kbytes: 318.0 usec/IO = 98.3 Mbytes/s | ||
+ | 64 kbytes: 444.8 usec/IO = 140.5 Mbytes/s | ||
+ | 128 kbytes: 712.1 usec/IO = 175.5 Mbytes/s | ||
+ | 256 kbytes: 1271.0 usec/IO = 196.7 Mbytes/s | ||
+ | 512 kbytes: 2293.7 usec/IO = 218.0 Mbytes/s | ||
+ | 1024 kbytes: 4376.1 usec/IO = 228.5 Mbytes/s | ||
+ | 2048 kbytes: 8388.3 usec/IO = 238.4 Mbytes/s | ||
+ | 4096 kbytes: 16462.6 usec/IO = 243.0 Mbytes/s | ||
+ | 8192 kbytes: 32684.5 usec/IO = 244.8 Mbytes/s | ||
+ | </pre> | ||
+ | |||
+ | ===Samsung 860 EVO 500GB=== | ||
+ | |||
+ | <pre> | ||
+ | === START OF INFORMATION SECTION === | ||
+ | Device Model: Samsung SSD 860 EVO 500GB | ||
+ | Serial Number: --- | ||
+ | LU WWN Device Id: 5 002538 e4034d87b | ||
+ | Firmware Version: RVT01B6Q | ||
+ | User Capacity: 500,107,862,016 bytes [500 GB] | ||
+ | Sector Size: 512 bytes logical/physical | ||
+ | Rotation Rate: Solid State Device | ||
+ | Form Factor: 2.5 inches | ||
+ | Device is: Not in smartctl database [for details use: -P showall] | ||
+ | ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5 | ||
+ | SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) | ||
+ | Local Time is: Mon Oct 21 08:18:54 2019 PDT | ||
+ | SMART support is: Available - device has SMART capability. | ||
+ | SMART support is: Enabled | ||
+ | |||
+ | root@freenas[~]# diskinfo -wS /dev/da23 | ||
+ | /dev/da23 | ||
+ | 512 # sectorsize | ||
+ | 500107862016 # mediasize in bytes (466G) | ||
+ | 976773168 # mediasize in sectors | ||
+ | 0 # stripesize | ||
+ | 0 # stripeoffset | ||
+ | 60801 # Cylinders according to firmware. | ||
+ | 255 # Heads according to firmware. | ||
+ | 63 # Sectors according to firmware. | ||
+ | ATA Samsung SSD 860 # Disk descr. | ||
+ | --- # Disk ident. | ||
Yes # TRIM/UNMAP support | Yes # TRIM/UNMAP support | ||
0 # Rotation rate in RPM | 0 # Rotation rate in RPM | ||
Line 479: | Line 555: | ||
Synchronous random writes: | Synchronous random writes: | ||
− | 0.5 kbytes: | + | 0.5 kbytes: 715.0 usec/IO = 0.7 Mbytes/s |
− | 1 kbytes: | + | 1 kbytes: 719.1 usec/IO = 1.4 Mbytes/s |
− | 2 kbytes: | + | 2 kbytes: 722.0 usec/IO = 2.7 Mbytes/s |
− | 4 kbytes: | + | 4 kbytes: 692.9 usec/IO = 5.6 Mbytes/s |
− | 8 kbytes: | + | 8 kbytes: 720.3 usec/IO = 10.8 Mbytes/s |
− | 16 kbytes: | + | 16 kbytes: 730.7 usec/IO = 21.4 Mbytes/s |
− | 32 kbytes: | + | 32 kbytes: 865.7 usec/IO = 36.1 Mbytes/s |
− | + | 64 kbytes: 927.5 usec/IO = 67.4 Mbytes/s | |
+ | 128 kbytes: 1286.1 usec/IO = 97.2 Mbytes/s | ||
+ | 256 kbytes: 1206.2 usec/IO = 207.3 Mbytes/s | ||
+ | 512 kbytes: 1808.5 usec/IO = 276.5 Mbytes/s | ||
+ | 1024 kbytes: 2955.0 usec/IO = 338.4 Mbytes/s | ||
+ | 2048 kbytes: 5197.0 usec/IO = 384.8 Mbytes/s | ||
+ | 4096 kbytes: 9625.1 usec/IO = 415.6 Mbytes/s | ||
+ | 8192 kbytes: 26001.2 usec/IO = 307.7 Mbytes/s | ||
+ | |||
</pre> | </pre> | ||
+ | |||
+ | ===Seagate IronWolf 110 480GB=== | ||
+ | |||
+ | <pre> | ||
+ | === START OF INFORMATION SECTION === | ||
+ | Device Model: ZA480NM10001 | ||
+ | Serial Number: --- | ||
+ | LU WWN Device Id: 5 000c50 03ea0daeb | ||
+ | Firmware Version: SF44011J | ||
+ | User Capacity: 480,103,981,056 bytes [480 GB] | ||
+ | Sector Sizes: 512 bytes logical, 4096 bytes physical | ||
+ | Rotation Rate: Solid State Device | ||
+ | Device is: Not in smartctl database [for details use: -P showall] | ||
+ | ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3 | ||
+ | SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s) | ||
+ | Local Time is: Mon Oct 21 08:07:46 2019 PDT | ||
+ | SMART support is: Available - device has SMART capability. | ||
+ | SMART support is: Enabled | ||
+ | |||
+ | root@freenas[~]# diskinfo -wS /dev/da20 | ||
+ | /dev/da20 | ||
+ | 512 # sectorsize | ||
+ | 480103981056 # mediasize in bytes (447G) | ||
+ | 937703088 # mediasize in sectors | ||
+ | 4096 # stripesize | ||
+ | 0 # stripeoffset | ||
+ | 58369 # Cylinders according to firmware. | ||
+ | 255 # Heads according to firmware. | ||
+ | 63 # Sectors according to firmware. | ||
+ | ATA ZA480NM10001 # Disk descr. | ||
+ | --- # Disk ident. | ||
+ | Yes # TRIM/UNMAP support | ||
+ | 0 # Rotation rate in RPM | ||
+ | Not_Zoned # Zone Mode | ||
+ | |||
+ | Synchronous random writes: | ||
+ | 0.5 kbytes: 3465.6 usec/IO = 0.1 Mbytes/s | ||
+ | 1 kbytes: 3428.0 usec/IO = 0.3 Mbytes/s | ||
+ | 2 kbytes: 3465.8 usec/IO = 0.6 Mbytes/s | ||
+ | 4 kbytes: 3348.7 usec/IO = 1.2 Mbytes/s | ||
+ | 8 kbytes: 3372.6 usec/IO = 2.3 Mbytes/s | ||
+ | 16 kbytes: 3418.3 usec/IO = 4.6 Mbytes/s | ||
+ | 32 kbytes: 3589.6 usec/IO = 8.7 Mbytes/s | ||
+ | 64 kbytes: 3494.9 usec/IO = 17.9 Mbytes/s | ||
+ | 128 kbytes: 3630.0 usec/IO = 34.4 Mbytes/s | ||
+ | 256 kbytes: 3916.0 usec/IO = 63.8 Mbytes/s | ||
+ | 512 kbytes: 4478.3 usec/IO = 111.6 Mbytes/s | ||
+ | 1024 kbytes: 5559.3 usec/IO = 179.9 Mbytes/s | ||
+ | 2048 kbytes: 7746.3 usec/IO = 258.2 Mbytes/s | ||
+ | 4096 kbytes: 12259.7 usec/IO = 326.3 Mbytes/s | ||
+ | 8192 kbytes: 20970.7 usec/IO = 381.5 Mbytes/s | ||
+ | </pre> | ||
+ | ===Benchmarks=== | ||
+ | This is a simple test. I have a Windows 10 VM, starting from the login screen I reboot the VM and time it using a stop watch.<br> | ||
+ | NOTE : Running with low latency SSD for ZIL does not improve performance enough to make this viable for many writes with several VMs. Trying to clone 13 VMs brought the system to a craw. A reboot took 15 minutes while the clone process was running. | ||
+ | <br> | ||
+ | <br> | ||
+ | Environment | ||
+ | *10Gb between esxi and FreeNAS | ||
+ | *NFS | ||
+ | *3 x raidz-2 vdevs with 6 1TB 7200rpm HDDs each. | ||
+ | *384GB RAM on FreeNAS. The entire VM disk should be cached, making this a write only test. | ||
+ | <br> | ||
+ | Results | ||
+ | *Sync Disabled : 43 seconds | ||
+ | *Intel 730 DC 186GB : 66 seconds | ||
+ | *ADATA SU800 119GB : 83 seconds | ||
+ | *Samsung 860 EVO 500GB : 95 seconds | ||
+ | *Innodisk 3MG2-P 116GB : 112 seconds | ||
+ | *Seagate IronWolf 110 480GB : 163 seconds | ||
+ | *No ZIL SSD : 330 seconds | ||
+ | <br> | ||
+ | <br> |
Latest revision as of 12:12, 21 October 2019
Contents
READ FIRST
Some considerations when working with ZFS
- ZFS uses vdevs and not physical disks.
- Be careful about how you add new disks to the array. No random adding and removing of disks (exception being when upgrading disks or a disk fails)
- ZFS is very powerful, be mindful of what you are going to do and plan it out!
- After a vdev is created, it can never be removed and you can not add into it.
Example:
NAME STATE READ WRITE CKSUM pool4tb ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdd ONLINE 0 0 0
radz1-0 is a vdev. To add more disks (other than hotspares) you must create a second vdev. In this case we are running two mirrored drives so it would be best to add a second pair of mirrored drives.
NAME STATE READ WRITE CKSUM pool4tb ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdd ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0
Now data will be striped across both vdevs.
ZFS on Linux Installation
CentOS 7
It has been reported that when installing zfs and it's dependencies at the same time, the kernel modules will not get created. Below are the current steps I found to work when installing ZFS.
yum -y install epel-release
Make sure the system is completely up to date.
yum -y update reboot -h
After reboot
yum -y install kernel-devel yum -y localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm yum -y install spl
If everything was done right, the following command will take a while (depending on hardware)
yum -y install zfs-dkms yum -y install zfs /sbin/modprobe zfs
Fedora 28
[1]The instructions from the zfsonlinux.org site are correct, except for enabling the repo before installing. Even issuing the "dnf --set-enable zfs.repo" would result in failure. Had to edit the repo file directly (/etc/yum.repos.d/zfs.repo) to enable. Not a big deal, but something good to know.
Create ZFS Pool
At this point you can create your pool. Most of the time we will be interested in a ZRAID configuration. Depending on how much parity your interested user raidz, raidz1, raidz2, or raidz3.
zpool create <name of pool> raidz <disk1> <disk2> <etc>
NOTE: By default this will create a mount point of "/<name of pool>"
To add a spare drive
zpool add <name of pool> spare <disk>
Make sure to enable automatic rebuild when a drive fails, especially when using hot spares.
zpool autoreplace=on <name of pool>
I ran into the following that would help with managing the disks. Creating a label for each disk would have saved me time in the past[2]
# glabel label rex1 ada0 # glabel label rex2 ada1 # glabel label rex3 ada2 # zpool create rex raidz1 label/rex1 label/rex2 label/rex3
Create ZFS Volumes
zfs create <name of pool>/<Volume Name> zfs set mountpoint=<mount point>
Example:
zfs create pool4tb/archive mkdir /archive zfs set mountpoint=/archive pool4tb/archive
Additional Options
To enable compression
zfs set compression=lz4 <name of pool>
To increase the number of copies of a file on a dataset
zfs set copies=<1,2,3>
To have the pool auto-expand
zpool set autoexpand=on <name of pool>
- Encryption
http://www.makethenmakeinstall.com/2014/10/zfs-on-linux-with-luks-encrypted-disks/
EXAMPLE
1x2TB HDD sdb 4x1TB HDDs sdc sdd sde sdf Using the above drives it is possible to create a variety of deployments. In this example we will create a RAID5 like configuration that spans across three 2TB devices. We start by creating a pools and adding the drives. [root@nas ~]# zpool create -f set1 raidz /dev/sdc /dev/sdd [root@nas ~]# zpool create -f set2 raidz /dev/sde /dev/sdf [root@nas ~]# zpool status pool: set1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM set1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 errors: No known data errors pool: set2 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM set2 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 errors: No known data errors [root@nas ~]# zfs create -V 1.50T set1/vdev1 [root@nas ~]# zfs create -V 1.50T set2/vdev1 [root@nas ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT set1 1.55T 214G 57.5K /set1 set1/vdev1 1.55T 1.76T 36K - set2 1.55T 214G 57.5K /set2 set2/vdev2 1.55T 1.76T 36K - [root@nas ~]# ls /dev/ <condensed output> zd0 zd16 [root@nas ~]# zpool create -f data raidz1 /dev/sdb /dev/zd0 /dev/zd16 [root@nas ~]# zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data 4.47T 896K 4.47T - 0% 0% 1.00x ONLINE - set1 1.81T 742K 1.81T - 0% 0% 1.00x ONLINE - set2 1.81T 429K 1.81T - 0% 0% 1.00x ONLINE - [root@nas ~]# df -lh Filesystem Size Used Avail Use% Mounted on /dev/sda3 33G 1.6G 32G 5% / devtmpfs 3.8G 0 3.8G 0% /dev tmpfs 3.8G 0 3.8G 0% /dev/shm tmpfs 3.8G 8.5M 3.8G 1% /run tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup /dev/sda1 497M 200M 298M 41% /boot tmpfs 775M 0 775M 0% /run/user/0 set1 214G 128K 214G 1% /set1 set2 214G 128K 214G 1% /set2 data 2.9T 256K 2.9T 1% /data
As you can see there is a LOT of wasted space using this method. Where we should have ~4TB of usable space we end with ~3TB. This was only an example, the better option is to create multiple independent data sets.
ZFS Send
Example of using ZFS send to replicate snapshots from a local pool to a local external drive.
nohup zfs send -R tank/datastore@auto-20180629.0000-2w | zfs recv -F backuppool/backup &
Incremental [3]
zfs send -R -i tank/datastore@auto-20180630.0000-2w tank/datastore@auto-20180701.0000-2w | zfs recv -F backuppool/backup
ssh[4]
nohup zfs send tank/datastore@auto-20180629.0000-2w | ssh root@somehost 'zfs receive backuppool/datastore@auto-20180629.0000-2w'
Troubleshooting
Auto import pool at boot
[5]There is a cache file that is used for mounting ZFS at boot. Make sure to run the following if ZFS is not importing on boot.
[root@nas ~]# systemctl enable zfs-import-cache.service
Kernel Module Failure After Upgrade
I ran the standard yum upgrade process on my home CentOS 7 server. After a reboot ZFS failed stating the module was not loaded and I should load it. However, modprobe would fail.
[root@nas ~]# modprobe zfs modprobe: ERROR: could not insert 'zfs': Invalid argument
Checking dmesg
[root@nas ~]# grep zfs /var/log/dmesg* /var/log/dmesg.old:[ 3.445947] zfs: disagrees about version of symbol vn_getattr /var/log/dmesg.old:[ 3.445950] zfs: Unknown symbol vn_getattr (err -22) /var/log/dmesg.old:[ 5.103167] zfs: disagrees about version of symbol vn_getattr /var/log/dmesg.old:[ 5.103172] zfs: Unknown symbol vn_getattr (err -22) /var/log/dmesg.old:[ 5.154686] zfs: disagrees about version of symbol vn_getattr /var/log/dmesg.old:[ 5.154691] zfs: Unknown symbol vn_getattr (err -22) /var/log/dmesg.old:[ 5.273800] zfs: disagrees about version of symbol vn_getattr /var/log/dmesg.old:[ 5.273804] zfs: Unknown symbol vn_getattr (err -22) /var/log/dmesg.old:[ 5.377193] zfs: disagrees about version of symbol vn_getattr /var/log/dmesg.old:[ 5.377200] zfs: Unknown symbol vn_getattr (err -22) /var/log/dmesg.old:[ 92.649735] zfs: disagrees about version of symbol vn_getattr /var/log/dmesg.old:[ 92.649739] zfs: Unknown symbol vn_getattr (err -22)
I found a post about this[6], and it mentioned to check the dkms status. Below is what I found.
[root@nas ~]# dkms status spl, 0.7.12, 3.10.0-862.14.4.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) zfs, 0.7.12, 3.10.0-862.14.4.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
[root@nas ~]# rpm -qa | grep kernel kernel-3.10.0-862.11.6.el7.x86_64 kernel-tools-3.10.0-862.14.4.el7.x86_64 kernel-3.10.0-693.5.2.el7.x86_64 kernel-tools-libs-3.10.0-862.14.4.el7.x86_64 kernel-3.10.0-862.14.4.el7.x86_64 kernel-3.10.0-862.9.1.el7.x86_64 kernel-headers-3.10.0-862.14.4.el7.x86_64 kernel-3.10.0-862.6.3.el7.x86_64
Set Hot Spare as replacement device
I had an issue where I created a raidz2 pool without spares (which is fine for this deployment). A drive failed, and I installed a replacement as a spare using the FreeNAS gui (this one was not ZFSoL). I was then stuck with a perpetually degraded pool.
pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/ca363e73-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/cca8828b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d1b86990-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d51049fe-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d804819b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/625287ff-7c6b-11e8-a699-002590fde644 ONLINE 0 0 0 gptid/dda24b58-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 spare-9 DEGRADED 0 0 0 17637264324123775223 OFFLINE 0 0 0 was /dev/gptid/e55c7104-5d4d-11e8-aaf6-002590fde644 gptid/051c5d74-612e-11e8-8357-002590fde644 ONLINE 0 0 0 gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 logs gptid/bdafc060-6ccc-11e8-8357-002590fde644 ONLINE 0 0 0 spares 227308045836062793 INUSE was /dev/gptid/051c5d74-612e-11e8-8357-002590fde644 errors: No known data errors
But if I would RTFM[7] I would know to detach the failed drive that I previously made offline.
root@freenas:~ # zpool detach tank 17637264324123775223 root@freenas:~ # zpool status pool: tank state: ONLINE scan: scrub repaired 0 in 0 days 00:55:01 with 0 errors on Sat Jun 30 12:16:59 2018 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/ca363e73-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/cca8828b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d1b86990-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d51049fe-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d804819b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/625287ff-7c6b-11e8-a699-002590fde644 ONLINE 0 0 0 gptid/dda24b58-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/051c5d74-612e-11e8-8357-002590fde644 ONLINE 0 0 0 gptid/e837b4dd-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 logs gptid/bdafc060-6ccc-11e8-8357-002590fde644 ONLINE 0 0 0 errors: No known data errors
ZFS Not Mounting After Reboot
For some reason my system stopped mounting my ZFS volumes at boot. For a year I would manually mount as needed (a reboot was rare). But now I found the issue[8]
systemctl enable zfs-import.target
FreeNAS Specific
Recover from KDB Panic
[9] During a zfs receive operation, the server crashed and during boot would kernel panic. In order to get online again I had to do the following.
- Shutdown the server
- Remove the data disks.
- Boot into FreeNAS.
- Insert the data disks (minus any that might be dying. In this case it was a raidz2 and one drive was failing)
- Decrypt the data disks using the command below (replace with your own values)
geli attach -p -k /data/geli/d144b46a-3567-427b-85a3-7db93fe6e170.key /dev/gptid/71c7afd0-9fbf-11e7-a9c9-d05099c312a7
- Import the pool. Idealy read only at first.
zpool import -o readonly=on DR-ARCHIVE
- If successful, export the pool and re-import. This time as read/write and with the proper mount.
zpool import -R /mnt DR-ARCHIVE
- Run a scrub.
NOTE : I am still recovering my pool. I do not know if it will boot properly.
Replace drive using CLI
Recently I installed a FreeNAS server as part of a consulting gig, but the refurbished drives that came with the server started to fail and needed replacement. I had to do this remotely without the GUI due to limited VPN connectivity, which posed an issue with gaining the gptid of the replacement drive. Up until now I have relied on the GUI to provision the gptid and import the disk. My previous examples also show that I normally use the entire disk instead of using partitions on the disk.
The following is what I did to obtain a gptid for the drive.[10]
- First I obtained the drive information. Had a local tech provide me the SN.
- Ran a script I wrote to pull SN from drives listed in /dev to obtain the correct device (/dev/da13)
- At this point I created the gpt partion on the disk using the steps from the reference above.
gpart create -s gpt da13 gpart add -t freebsd-ufs da13
- Then I checked to see if the disk showed up with a label.
glabel list | grep da13
- At which point I could found the label in the full list
glabel list
- Then started the replacement of the failed disk that I previously took offline.
zpool replace tank 17805351018045823548 gptid/625287ff-7c6b-11e8-a699-002590fde644 pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Jun 30 09:43:44 2018 1.58T scanned at 647M/s, 840G issued at 335M/s, 1.58T total 72.7G resilvered, 51.81% done, 0 days 00:39:48 to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/ca363e73-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/cca8828b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d1b86990-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d51049fe-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/d804819b-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 replacing-5 OFFLINE 0 0 0 17805351018045823548 OFFLINE 0 0 0 was /dev/gptid/db20f312-5d4d-11e8-aaf6-002590fde644 gptid/625287ff-7c6b-11e8-a699-002590fde644 ONLINE 0 0 0 (resilvering) gptid/dda24b58-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e11d1f00-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0 gptid/e39e8936-5d4d-11e8-aaf6-002590fde644 ONLINE 0 0 0
Clear drives that were not properly exported
[11]For when you export a pool, planned on using the disks, but did not mark the disks to be cleared.
- For each disk run the following then reboot.
dd if=/dev/zero of=/dev/da23 bs=1m count=1 dd if=/dev/zero of=/dev/da23 bs=1m oseek=`diskinfo da23 | awk '{print int($3 / (1024*1024)) - 4;}'`
ZIL Disk performance
ADATA SU800 128GB
=== START OF INFORMATION SECTION === Device Model: ADATA SU800 Serial Number: --- LU WWN Device Id: 5 707c18 300038465 Firmware Version: Q0922FS User Capacity: 128,035,676,160 bytes [128 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Oct 17 07:49:26 2019 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled root@freenas[~]# diskinfo -wS /dev/da21 /dev/da21 512 # sectorsize 128035676160 # mediasize in bytes (119G) 250069680 # mediasize in sectors 0 # stripesize 0 # stripeoffset 15566 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. ATA ADATA SU800 # Disk descr. --- # Disk ident. Yes # TRIM/UNMAP support 0 # Rotation rate in RPM Not_Zoned # Zone Mode Synchronous random writes: 0.5 kbytes: 781.3 usec/IO = 0.6 Mbytes/s 1 kbytes: 784.3 usec/IO = 1.2 Mbytes/s 2 kbytes: 800.7 usec/IO = 2.4 Mbytes/s 4 kbytes: 805.7 usec/IO = 4.8 Mbytes/s 8 kbytes: 795.7 usec/IO = 9.8 Mbytes/s 16 kbytes: 806.0 usec/IO = 19.4 Mbytes/s 32 kbytes: 787.7 usec/IO = 39.7 Mbytes/s 64 kbytes: 944.2 usec/IO = 66.2 Mbytes/s 128 kbytes: 1353.6 usec/IO = 92.3 Mbytes/s 256 kbytes: 2001.1 usec/IO = 124.9 Mbytes/s 512 kbytes: 3185.4 usec/IO = 157.0 Mbytes/s 1024 kbytes: 5407.7 usec/IO = 184.9 Mbytes/s 2048 kbytes: 7622.4 usec/IO = 262.4 Mbytes/s 4096 kbytes: 12125.0 usec/IO = 329.9 Mbytes/s 8192 kbytes: 21478.9 usec/IO = 372.5 Mbytes/s
Innodisk 3MG2-P (FreeNAS L2ARC)
This is the official FreeNAS L2ARC SSD sold on Amazon by ixSystems. Please note that this was not intended to be a ZIL disk, but I had it on hand so why not test it?
=== START OF INFORMATION SECTION === Model Family: Innodisk 3IE2/3ME2/3MG2/3SE2 SSDs Device Model: 2.5" SATA SSD 3MG2-P Serial Number: --- Firmware Version: M150821 User Capacity: 124,034,899,968 bytes [124 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Oct 17 08:09:09 2019 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled root@freenas[~]# diskinfo -wS /dev/da21 /dev/da21 512 # sectorsize 124034899968 # mediasize in bytes (116G) 242255664 # mediasize in sectors 0 # stripesize 0 # stripeoffset 15079 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. ATA 2.5" SATA SSD 3M # Disk descr. --- # Disk ident. Yes # TRIM/UNMAP support 0 # Rotation rate in RPM Not_Zoned # Zone Mode Synchronous random writes: 0.5 kbytes: 1449.3 usec/IO = 0.3 Mbytes/s 1 kbytes: 1458.5 usec/IO = 0.7 Mbytes/s 2 kbytes: 1477.6 usec/IO = 1.3 Mbytes/s 4 kbytes: 1492.7 usec/IO = 2.6 Mbytes/s 8 kbytes: 1471.4 usec/IO = 5.3 Mbytes/s 16 kbytes: 1503.7 usec/IO = 10.4 Mbytes/s 32 kbytes: 1554.2 usec/IO = 20.1 Mbytes/s 64 kbytes: 1711.3 usec/IO = 36.5 Mbytes/s 128 kbytes: 2101.6 usec/IO = 59.5 Mbytes/s 256 kbytes: 2535.3 usec/IO = 98.6 Mbytes/s 512 kbytes: 3598.5 usec/IO = 138.9 Mbytes/s 1024 kbytes: 5856.2 usec/IO = 170.8 Mbytes/s 2048 kbytes: 8262.6 usec/IO = 242.1 Mbytes/s 4096 kbytes: 13505.4 usec/IO = 296.2 Mbytes/s 8192 kbytes: 23919.1 usec/IO = 334.5 Mbytes/s
Intel 730 DC
The key here is that though this SSD has a slower interface, it doesn't matter because the throughput on small file writes is so high.
=== START OF INFORMATION SECTION === Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs Device Model: INTEL SSDSC2BA200G3T Serial Number: --- LU WWN Device Id: 5 5cd2e4 04b605189 Add. Product Id: DELL(tm) Firmware Version: 5DV1DL05 User Capacity: 200,049,647,616 bytes [200 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Oct 17 09:30:18 2019 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled root@freenas[~]# diskinfo -wS /dev/da21 /dev/da21 512 # sectorsize 200049647616 # mediasize in bytes (186G) 390721968 # mediasize in sectors 0 # stripesize 0 # stripeoffset 24321 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. ATA INTEL SSDSC2BA20 # Disk descr. --- # Disk ident. Yes # TRIM/UNMAP support 0 # Rotation rate in RPM Not_Zoned # Zone Mode Synchronous random writes: 0.5 kbytes: 328.0 usec/IO = 1.5 Mbytes/s 1 kbytes: 310.2 usec/IO = 3.1 Mbytes/s 2 kbytes: 269.4 usec/IO = 7.2 Mbytes/s 4 kbytes: 182.9 usec/IO = 21.4 Mbytes/s 8 kbytes: 205.6 usec/IO = 38.0 Mbytes/s 16 kbytes: 254.2 usec/IO = 61.5 Mbytes/s 32 kbytes: 318.0 usec/IO = 98.3 Mbytes/s 64 kbytes: 444.8 usec/IO = 140.5 Mbytes/s 128 kbytes: 712.1 usec/IO = 175.5 Mbytes/s 256 kbytes: 1271.0 usec/IO = 196.7 Mbytes/s 512 kbytes: 2293.7 usec/IO = 218.0 Mbytes/s 1024 kbytes: 4376.1 usec/IO = 228.5 Mbytes/s 2048 kbytes: 8388.3 usec/IO = 238.4 Mbytes/s 4096 kbytes: 16462.6 usec/IO = 243.0 Mbytes/s 8192 kbytes: 32684.5 usec/IO = 244.8 Mbytes/s
Samsung 860 EVO 500GB
=== START OF INFORMATION SECTION === Device Model: Samsung SSD 860 EVO 500GB Serial Number: --- LU WWN Device Id: 5 002538 e4034d87b Firmware Version: RVT01B6Q User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Oct 21 08:18:54 2019 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled root@freenas[~]# diskinfo -wS /dev/da23 /dev/da23 512 # sectorsize 500107862016 # mediasize in bytes (466G) 976773168 # mediasize in sectors 0 # stripesize 0 # stripeoffset 60801 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. ATA Samsung SSD 860 # Disk descr. --- # Disk ident. Yes # TRIM/UNMAP support 0 # Rotation rate in RPM Not_Zoned # Zone Mode Synchronous random writes: 0.5 kbytes: 715.0 usec/IO = 0.7 Mbytes/s 1 kbytes: 719.1 usec/IO = 1.4 Mbytes/s 2 kbytes: 722.0 usec/IO = 2.7 Mbytes/s 4 kbytes: 692.9 usec/IO = 5.6 Mbytes/s 8 kbytes: 720.3 usec/IO = 10.8 Mbytes/s 16 kbytes: 730.7 usec/IO = 21.4 Mbytes/s 32 kbytes: 865.7 usec/IO = 36.1 Mbytes/s 64 kbytes: 927.5 usec/IO = 67.4 Mbytes/s 128 kbytes: 1286.1 usec/IO = 97.2 Mbytes/s 256 kbytes: 1206.2 usec/IO = 207.3 Mbytes/s 512 kbytes: 1808.5 usec/IO = 276.5 Mbytes/s 1024 kbytes: 2955.0 usec/IO = 338.4 Mbytes/s 2048 kbytes: 5197.0 usec/IO = 384.8 Mbytes/s 4096 kbytes: 9625.1 usec/IO = 415.6 Mbytes/s 8192 kbytes: 26001.2 usec/IO = 307.7 Mbytes/s
Seagate IronWolf 110 480GB
=== START OF INFORMATION SECTION === Device Model: ZA480NM10001 Serial Number: --- LU WWN Device Id: 5 000c50 03ea0daeb Firmware Version: SF44011J User Capacity: 480,103,981,056 bytes [480 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Oct 21 08:07:46 2019 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled root@freenas[~]# diskinfo -wS /dev/da20 /dev/da20 512 # sectorsize 480103981056 # mediasize in bytes (447G) 937703088 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 58369 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. ATA ZA480NM10001 # Disk descr. --- # Disk ident. Yes # TRIM/UNMAP support 0 # Rotation rate in RPM Not_Zoned # Zone Mode Synchronous random writes: 0.5 kbytes: 3465.6 usec/IO = 0.1 Mbytes/s 1 kbytes: 3428.0 usec/IO = 0.3 Mbytes/s 2 kbytes: 3465.8 usec/IO = 0.6 Mbytes/s 4 kbytes: 3348.7 usec/IO = 1.2 Mbytes/s 8 kbytes: 3372.6 usec/IO = 2.3 Mbytes/s 16 kbytes: 3418.3 usec/IO = 4.6 Mbytes/s 32 kbytes: 3589.6 usec/IO = 8.7 Mbytes/s 64 kbytes: 3494.9 usec/IO = 17.9 Mbytes/s 128 kbytes: 3630.0 usec/IO = 34.4 Mbytes/s 256 kbytes: 3916.0 usec/IO = 63.8 Mbytes/s 512 kbytes: 4478.3 usec/IO = 111.6 Mbytes/s 1024 kbytes: 5559.3 usec/IO = 179.9 Mbytes/s 2048 kbytes: 7746.3 usec/IO = 258.2 Mbytes/s 4096 kbytes: 12259.7 usec/IO = 326.3 Mbytes/s 8192 kbytes: 20970.7 usec/IO = 381.5 Mbytes/s
Benchmarks
This is a simple test. I have a Windows 10 VM, starting from the login screen I reboot the VM and time it using a stop watch.
NOTE : Running with low latency SSD for ZIL does not improve performance enough to make this viable for many writes with several VMs. Trying to clone 13 VMs brought the system to a craw. A reboot took 15 minutes while the clone process was running.
Environment
- 10Gb between esxi and FreeNAS
- NFS
- 3 x raidz-2 vdevs with 6 1TB 7200rpm HDDs each.
- 384GB RAM on FreeNAS. The entire VM disk should be cached, making this a write only test.
Results
- Sync Disabled : 43 seconds
- Intel 730 DC 186GB : 66 seconds
- ADATA SU800 119GB : 83 seconds
- Samsung 860 EVO 500GB : 95 seconds
- Innodisk 3MG2-P 116GB : 112 seconds
- Seagate IronWolf 110 480GB : 163 seconds
- No ZIL SSD : 330 seconds
- ↑ https://github.com/zfsonlinux/zfs/wiki/Fedora
- ↑ https://forums.freebsd.org/threads/how-to-recover-degraded-zpool.28084/
- ↑ https://docs.oracle.com/cd/E19253-01/819-5461/gbchx/index.html
- ↑ https://128bit.io/2010/07/23/fun-with-zfs-send-and-receive/
- ↑ http://serverfault.com/questions/732184/zfs-datasets-dissappear-on-reboot
- ↑ https://github.com/zfsonlinux/zfs/issues/1155
- ↑ https://docs.oracle.com/cd/E19253-01/819-5461/gcvdi/index.html
- ↑ https://serverfault.com/questions/914173/zfs-datasets-no-longer-automatically-mount-on-reboot-after-system-upgrade
- ↑ https://www.ixsystems.com/community/threads/kdb-panic-on-pool-import.79693/
- ↑ https://mikebeach.org/2014/03/01/how-to-format-a-disk-gpt-in-freenas/
- ↑ https://www.ixsystems.com/community/threads/error-creating-pool-problem-with-gpart.70629/