Lessons From a Gmirror Failure
23 July, 2013, 08:59 pm in "BSD"
FreeBSD has some great tools for raid and disk management in modern versions. One of these great tools is gmirror. I have been using gmirror since about FreeBSD 6.2. I switched to software mirroring at that time from dedicated hardware raid controllers after a bad experience where I lost the controller, could not get an exact replacement so I found my client's raid array was now so much useless unreadable data.
The beauty of gmirror is that it is controller independent and if you lose a controller or disk the system will keep running. If your hardware failure means you have to move the disks, or maybe just one of the disks, to another system it should just work. It usually does - as there is no dependence on specific controllers with their own special low level secret formatting magic preventing use of a different controller. Often even contolers from the same vendor do not work with disks prepared on a different model. I have been bitten more than once by Dell boxes that should have been the same!
I can demonstrate the value of gmirror with an incident at a client a few years ago. The IT director called me about his Chicago mail server. It seems that his Chicago tech had just called to tell him that smoke was coming from one of the disk drives on the mail server. I had set the box up a few years prior using FreeBSD 6.x with a pair of gmirrored drives that were hot swap. I told him to pull the smoking drive and replace it with the same or larger size disk and call me back. When they called back I logged into the box and ran the following commands:
# gmirror forget gm0
# gmirror insert gm0 ad2
# gmirror rebuild gm0 ad2
In short order they were back to running on both disks with zero down time.
The above is typical of my experience with gmirror on many systems at many locations.
Over all I have been a happy gmirror user for years, but just like raid5, gmirror has it's own "hole". I discovered this hole on my own main system.
In reading my nightly email from my various systems I noted that my FreeBSD 7.x machine had lost one of it's disks in a gmirror pair. No problem, I thought. I shutdown the box (no hot swap) and pulled the bad disk. I had an identical replacement disk on the shelf, so I inserted it and after rebooting ran the same series of commands as noted above. Just as the resilver operation got into the 98% range I started seeing hard read errors on the console of the system. Eventually after all re-tries were exhausted I saw that the gmirror rebuild failed. I tried several more times to rebuild the mirror and at the same time consulted on-line resources. It seems that since the remirror process is a bit for bit disk copy, just as if we were doing
dd if=/dev/ad0 of=/dev/ad1
it will always fail if there are read errors on the source disk. My conclusion was that a part of the disk that I was not using yet had lost some of its rust and would be impossible to read for a rebuild. I was caught in the gmirror hole. My data was all there and intact, but the only way to recover would be to dump the entire disk with something like tar in single user mode and then boot off of a live CD format a new disk and restore from the backup. I was looking at an extended down time while engaging in this process. While I could bite the bullet and accept the down time I needed to find a better way. I consulted with others long in the tooth with FreeBSD and gmirror and each person who had run into this situation said the same thing. Dump, new disk, restore, second new disk, gmirror setup again. About this time I was kicking myself for not running a 3 disk mirror instead of keeping the spare on the shelf.
My failed box held my home directory and a FreeBSD jail that acted as my primary DNS server as well as my mail and web server. I did not want the DNS/MAIL/WEB service to be off line for too long, so I decided to spin off a tar of the entire disk just in case and set up a regular rsync to another box so I could restore easily if the last disk died. In the meantime I looked for a solution that would limit my down time to a few moments, not hours.
The solution came in the form of a newly configured FreeBSD 9.1 box running a zfs RAID-Z pool across 3 disks. I had been charged with recycling a Dell 1750 with three 75G SCSI disks that was pulled from a client. The client wanted to make sure I destroyed their data before the computer went to anyone else. I grabbed the very fine FreeBSD install image called mfsbsd from http://mfsbsd.vx.sk. Martin is a FreeBSD team member and his zfs install disks do a better job of installing FreeBSD than the current official install images.
I will leave much of the detail of the operation for another article, but the basics were:
o Install FreeBSD 9.1 with zfs RAID-Z
o Install my must haves (bash, screen, nload, openntpd, ezjail)
o Build a custom kernel (yeah, I still do this)
o Configure base system to be jail friendly (e.g., bind ssh to a single IP)
o Setup ez-jail following the docs
o Setup a test jail
o Hammer on the system with disk and network i/o for days to prove the system
o Configure a jail to hold the DNS/web/mail server
o Rsync the jail from the old 7.x box to the 9.1 box.
o Test the new jail on a different IP address
o Shutdown the old jail and the new jail
o Rsync the old jail to the new jail again
o Switch the IP address of the new jail
o Start new jail
o Done, with a total down time of less than 5 minutes!
So using jails saved me from having an extended downtime. Now that my jails are on a system running zfs it is possible to do zfs snapshots and send those to another system for safekeeping and quickly bring them up in case of hardware failure.
Will I use gmirror in the future? Probably, but it will be on systems that are too small to support zfs. If I have the choice on a smaller system to run with 3 disks I will probably go for graid3 instead of gmirror. That one extra disk will of course give more disk space to the total array, but my biggest hope is that with more disks I can avoid the silent 2 disk failure that I ran into with gmirror.
For now zfs will be my default if the box has at least 1G of ram. The 1750 has been up for 17 days acting as my mail/DNS/web server and also as a build machine for various things which I am testing in other jails. Contrary to what some folks have stated it would appear that zfs can run on boxes with less than 4G of ram nicely.
[ No Comments Yet |
No Comments Yet - Post Comments