gleblanc (at) cu-portland.edu
This is a FAQ for the Linux-RAID mailing list, hosted on vger.kernel.org. vger.rutgers.edu is gone, so don't bother looking for it. It's intended as a supplement to the existing Linux-RAID HOWTO, to cover questions that keep occurring on the mailing list. PLEASE read this document before your post to the list.
My favorite archives are at http://www.geocrawler.com/lists/3/Linux/57/0/.
Other archives are available at http://marc.theaimsgroup.com/?l=linux-raid&r=1&w=2
Another archive site is http://firstname.lastname@example.org/
The latest version of this FAQ will be available from the LDP website at http://www.LinuxDoc.org/FAQ/. As soon as I get my server at home fixed I'll make it available there as well.
Well, obviously this list covers RAID in relation to Linux. Most of the discussions are related to the raid code that's been built into the Linux kernel. There are also a few discussions on getting hardware based RAID controllers working using Linux as the operating system. Any and all of these discussions are valid for this list.
Well, the short answer is, it depends. Some distributions are using the RAID 0.90 patches, while others leave the kernel with the older md code. Unfortunately, I don't have a list of which distributions have which kernels. If you'd like to maintain such a list, please email me <<email@example.com>> as well as the linux-raid mailing list.
If you download a 2.2.x kernel from ftp.kernel.org, then you will need to patch your kernel.
That depends on which kernel series you're using. If you're using the 2.4.x kernels, then you've already got the latest RAID code that's available. If you're running 2.2.x, see the following instructions on how to find out.
The easiest way is to check what's in /proc/mdstat. Here's a sample from a 2.2.x kernel, with the RAID patches applied.
The "Personalities" line in your kernel may not look exactly like the above, if you have RAID compiled as modules. Most distributions will have RAID compiled as modules to save space on the boot diskette. If you're not using any RAID sets, then you will probably see a blank space at the end of the "Personalities" line, don't worry, that just means that the RAID modules aren't loaded yet.
Here's a sample from a 2.2.x kernel, without the RAID patches applied.
The patches for the 2.2.x kernels up to, and including, 2.2.13 are available from ftp.kernel.org. Use the kernel patch that most closely matches your kernel revision. For example, the 2.2.11 patch can also be used on 2.2.12 and 2.2.13.
The patches for 2.2.14 and later kernels are at http://people.redhat.com/mingo/raid-patches/. Use the right patch for your kernel, these patches haven't worked on other kernel revisions yet. Please use something like wget/curl/lftp to retrieve this patch, as it's easier on the server than using a client like Netscape. Downloading patches with Lynx has been unsuccessful for me; wget may be the easiest way.
First, unpack the kernel into some directory, generally people use /usr/src/linux. Change to this directory, and type patch -p1 < /path/to/raid-version.patch.
Software RAID works with any block device in the Linux kernel. This includes IDE and SCSI drives, as well as most harware RAID controllers. There are no different patches for IDE drives vs. SCSI drives.
3.1. Why are the RAIDtools at http://people.redhat.com/mingo/raid-patches/ labeled dangerous, and if they're dangerous, should I use them?
The tools are labeled dangerous because the RAID code isn't part of the "stable" Linux kernel.
The tools found at the above URL are the latest and greatest. You should use these tools with the kernel patches from the same location.
No, the dangerous tools available from http://people.redhat.com/mingo/raid-patches/ are the most current tools to use. Everyone using RAID with the patches at the above location should be using these dangerous tools.
A couple of things should indicate when a disk has failed. There should be quite a few messages in /var/log/messages indicating errors accessing that device, which should be a good indication that something is wrong.
You should also notice that your /proc/mdstat looks different. Here's a snip from a good /proc/mdstat
And here's one from a /proc/mdstat where one of the RAID sets has a missing disk.
I don't know if /proc/mdstat will reflect the status of a HOT SPARE. If you have set one up, you should be watching /var/log/messages for any disk failures. I'd like to get some logs of a disk failure, and /proc/mdstat from a system with a hot spare.
RAID generally doesn't mark a disk as bad unless it is, so you probably need a new disk. Most disks have a 3 year warranty, but some good SCSI hard drives may have a 5 year warranty. See if you can get the manufacturer to replace the failed disk for you.
When you get the new disk, power down the system, and install it, then partition the drive so that it has partitions the size of your missing RAID partitions. After you're finished partitioning the disk, use the command raidhotadd to put the new disk into the array and begin reconstruction. See Chapter 6 of the Software RAID HOWTO for more information.
In that message "physical units" refers to disks, and not to blocks on the disks. Since there is more than 1 RAID array that needs resyncing on a disk, the RAID code is going to sync md4 first, and md5 second, to avoid excessive seeks (also called thrashing), which would drastically slow the resync process.