RAID/Software
From Gentoo Linux Wiki
Contents |
[edit] About the Install
Software RAID is compatible with a dual boot environment involving windows but windows will not be able to mount or read any partition involved in the pure software RAID, and all pseudo-hardware RAID controllers must be turned off.
This HOWTO assumes you are using SATA drives but it should work equally well with IDE drives. If you are using IDE drives, for maximum performance make sure that each drive is a master on its own separate channel.
To partition drives similarly to how the gentoo install docs suggest:
device mount size /dev/sda1 /boot 32MB /dev/sda2 swap >=2*RAM /dev/sda3 / 10GB /dev/sda4 /home 180GB (This partition is optional but recommended)
When you partition your disks, make sure that your partitions use fd (Linux RAID autodetect) as Partition Type instead of the default 83 (Linux native) or 82 (swap).
/boot would be best chosen as a RAID1. Recall that in RAID1, data is mirrored on multiple disks, so if there is a problem with your RAID somehow, GRUB/LILO could point to any of the copies of the kernel on any of the partitions in the /boot RAID1 and a normal boot will occur.
In this HOWTO, /, /boot, /home and the swap partition will be RAID 1 (mirror). For better performance you could use RAID10 in the far layout instead (raid10,f2), this is a direct replacement with the enhanced raid10 driver, which gives double the sequential read speed compared to raid1. Use "--level=10 -p f2" as additional parameters when creating arrays with mdadm.
If you do not place your swap partition on RAID and a drive containing your swap partition dies, your system will likely die when your system tries to access the swap partition.
[edit] Load Kernel Modules
Load the appropriate RAID module.
For RAID-1, RAID-0 and RAID-5 respectively.
[edit] Setup Partitions
You can partition your drives with tools such as fdisk or cfdisk. There is nothing different here except to make sure:
- Your partitions are the same size on each drive. See below for instructions on copying a partition map.
- Your partitions to be included in the RAID are set to partition type fd, Linux RAID auto-detect. If not set to fd, the partitions will fail to be added to the RAID on reboot.
This might be a good time to play with the hdparm tool. It allows you to change hard drive access parameters, which might speed up disk access. Another use is if you are using a whole disk as a hot spare. You may wish to change its spin down time so that it spends most of its time in standby, thus extending its life.
You can also setup the first disk partitions and then copy the entire partition table to the second disk with the following command:
[edit] Setup RAID
Now before we start creating the RAID arrays, we need to create the metadevice nodes:
After partitioning, create the /etc/mdadm.conf file (yes, indeed, on the Installation CD environment) using mdadm, an advanced tool for RAID management. For instance, to have the boot, swap and root partition mirrored (RAID-1) covering /dev/sda and /dev/sdb, the following commands can be used:
Or if you are lazy:
On the other hand if you want to put 4 partitions (sdc1, sdd1, sde1, sdf1) into a single RAID-5 then try this command:
You may check /proc/mdstat to see if the RAID devices are done syncing:
you can use also:
which refresh the output of /proc/mdstat every n seconds. You can cancel the output with: CTRL+C
It should look something like this (showing one array syncing and the other one already completed):
Personalities : [raid1]
md2 : active raid1 sdb3[1] sda3[0]
184859840 blocks [2/2] [UU]
[======>..............] resync = 33.1% (61296896/184859840) finish=34.3min speed=59895K/sec
md1 : active raid1 sdb1[1] sda1[0]
10000320 blocks [2/2] [UU]
unused devices: <none>
If an array is still syncing, you may still proceed to creating filesystems, because the sync operation is completely transparent to the file system. (Note: if a drive happens to fail before the RAID sync finishes, then you're in trouble.)
Create the filesystems on the disk.
mke2fs -j /dev/md3
or
stride=stride-size
Configure the filesystem for a RAID array with stride-size filesystem blocks.
This is the number of blocks read or written to disk before moving to next disk.
This mostly affects placement of filesystem metadata like bitmaps at mke2fs(2) time
to avoid placing them on a single disk, which can hurt the performance.
It may also be used by block allocator.
stripe_width=stripe-width Configure the filesystem for a RAID array with stripe-width filesystem blocks per stripe. This is typically be stride-size * N, where N is the number of data disks in the RAID (e.g. RAID 5 N+1, RAID 6 N+2). This allows the block allocator to prevent read-modify-write of the parity in a RAID stripe if possible when the data is written.
[edit] Create the Swap Partition
As described above, we earlier used RAID-0 for our swap partition. But if one of your discs dies, the system will most likely crash (since in a RAID-0 the swap data will be split over all discs). So now we use a mirrored array type:
Your fstab could look like:
/dev/md2 swap swap defaults 0 0
There is no performance reason to use RAID for swap. The kernel itself can stripe swapping on several devices if you give them the same priority in the /etc/fstab file. Using a mirrored raid type such as raid1 or raid10,f2 will make writing in the swap area half the speed, as data is written twice.
A striped /etc/fstab looks like:
/dev/sda2 swap swap defaults,pri=1 0 0 /dev/sdb2 swap swap defaults,pri=1 0 0
For reliability reasons, you may choose to use RAID for swap. With a non-RAID configuration as shown above, a drive failure on any of the swap can crash your system. Also, the above configuration, while it may be faster than using a single drive for swap, it is also 2 times more likely for a drive to fail and take your system with it.
[edit] Mount Partitions
Turn the swap on:
Mount the /, /boot and /home RAIDs:
Copy RAID configuration
Make chrooted environment like real ;-)
[edit] Continue the Install
Continue with the Gentoo Handbook starting with the section entitled "Installing the Gentoo Installation Files". Use /dev/md1 for the boot partition, /dev/md3 for the root partition and /dev/md4 for the home partition.
When you're configuring your kernel, make sure you have the appropriate RAID support in your kernel and not as module.
| Linux Kernel Configuration: Raid configuration |
Device Drivers ---> Multi-device support (RAID and LVM) ---> [*] Multiple devices driver support (RAID and LVM) <*> RAID support <*> RAID-0 (striping) mode <*> RAID-1 (mirroring) mode <*> Device mapper support |
When installing extra tools, emerge mdadmin as well.
otherwise mdadm will not be loaded at boot time.
When configuring your bootloader, make sure it gets installed in the MBR of both disks if you use mirroring (RAID 1).
[edit] Installing Grub onto both MBRs
[1] x86_64-pc-linux-gnu-3.4.6 * [2] x86_64-pc-linux-gnu-3.4.6-hardenednopie [3] x86_64-pc-linux-gnu-3.4.6-hardenednopiessp [4] x86_64-pc-linux-gnu-3.4.6-hardenednossp [5] x86_64-pc-linux-gnu-3.4.6-vanilla
emerge grub -av
...gcc-config 1
Since the /boot partition is a RAID, grub cannot read it to get the bootloader. It can only access physical drives. Thus, you still use (hd0,0) in this step.
Run grub:
You must see GRUB prompt:
grub>
If you are using a RAID 1 mirror disk system, you will want to install grub on all the disks in the system, so that when one disk fails, you are still able to boot. The find command above will list the disks, e.g.
grub> find /boot/grub/stage1 (hd0,0) (hd1,0) grub>
Now, if your disks are /dev/sda and /dev/sdb, do the following to install GRUB on /dev/sda MBR:
device (hd0) /dev/sda root (hd0,0) setup (hd0)
This will install grub into the /dev/sdb MBR:
device (hd0) /dev/sdb root (hd0,0) setup (hd0)
The device command tells grub to assume the drive is (hd0), i.e. the first disk in the system, when it is not necessarily the case. If your first disk fails, however, your second disk will then be the first disk in the system, and so the MBR will be correct.
The grub.conf does change from the normal install. The difference is in the specified root drive, it is now a RAID drive and no longer a physical drive. For example it would look like:
default 0 timeout 30 splashimage=(hd0,0)/boot/grub/splash.xpm.gz title=My example Gentoo Linux root (hd0,0) kernel /boot/bzImage root=/dev/md3 md=3,/dev/sda3,/dev/sdb3
[edit] Misc RAID stuff
To see if RAID is functioning properly after reboot do:
There should be one entry per RAID drive. The RAID 1 drives should have a "[UU]" in the entry, letting you know that the two hard drives are "up, up". If one goes down you will see "[U_]". If this ever happens your system will still run fine, but you should replace that hard drive as soon as possible.
To rebuild a RAID 1:
- Power down the system
- Replace the failed disk
- Power up the system once again
- Create identical partitions on the new disk - i.e copy the partition scheme from the drive that is still online.
- Remove the old partition from the array and add the new partition back
You can copy a partition map from one disk to another with dd. Additionally, since the target drive is not in use we can rewrite partition map with fdisk to force the partition map to be re-read by the kernel:
To remove the failed partition and add the new partition:
Watch the automatic reconstruction run with:
If one of the partitions is a boot partition, don't forget to re-run grub on that partition so that grub boots from the new disk, not from the disk you copied the partition from using dd. This is important if you ever have to replace that disk!
[edit] Notification
If you want to receive e-mail alerts about your RAID system mdadmin must be configured with your e-mail address.
Make sure you can send mail from your machine. If all you need is basic SMTP support, you may wish to consider installing nail. This is a version of mail that can be compiled with SMTP support.
Make sure that the next line is in the /etc/mdadm.conf with the correct To e-mail address:
MAILADDR root@example.com
| Fix me: An explanation of how to get mail-client/nail to work is required. I couldn't get it going. mail-mta/ssmtp was easy. |
To verify that e-mail notification works, use this test command:
Finally add the mdadm script to your default RC, and start it to begin monitoring:
| Fix me: RC default and boot? |
Now if one of your disks fails you will be notified at the address supplied.
[edit] Write-intent bitmap
A write-intent bitmap is used to record which areas of a RAID component have been modified since the RAID array was last in sync. Basically, the RAID driver periodically writes out a small table recording which portions of a RAID component have changed. Therefore, if you lose power before all drives are in sync, when the array starts up a full re-sync is not needed. Only the changed portions need to be re-synced.
[edit] To turn on write-intent bitmapping
Install a modern mdadm: >=sys-fs/mdadm-2.4.1 Install a modern kernel: >=2.6.16
Your RAID volume must be configured with a persistent superblock and has to be fully synchronized. Use the following command to verify whether these conditions have been met:
Make sure it says:
State : active Persistence : Superblock is persistent
Add a bitmap with the following command:
You can monitor the status of the bitmap as you write to your array with:
[edit] To turn off write-intent bitmapping
Remove the bitmap with the following command:
[edit] Data Scrubbing
In short: Especially if you run a RAID5 array, trigger an active bad block check on a regular basis, or there is a high chance of hidden bad blocks making your RAID unusable during reconstruction.
Normally, RAID passively detects bad blocks. If a read error occurs, the data is reconstructed from the rest of the array, and the bad block is rewritten. If the block can not be rewritten, the defective disk is kicked out of the active array.
Once the defective drive is replaced, reconstruction will cause all blocks of the remaining drives to be read. If this process runs across a previously undetected bad block on the remaining drives, another drive will be marked as failed, making RAID5 unusable. The larger the disks, the higher the odds that passive bad block detection will be inadaquate. Therefore, with today's large disks it is important to actively perform data scrubbing on your array.
With a modern (>=2.6.16) kernel, this command will initiate a data consistency and bad block check, reading all blocks, checking them for consistency, and attempting to rewrite inconsistent blocks and bad blocks.
You can monitor the progress of the check with:
You should have your array checked daily or weekly by adding the appropriate command to /etc/crontab.
If you find yourself needlessly checking your array (like I was) and want to stop it safely, you can either stop the entire array, or:
[edit] See Also
- linux-raid mailing list
- The original gentoo forum post
- Linux Software RAID HOWTO
- Linux Software Raid Wiki
- Gentoo/x86 Installation Tips and Tricks for use of mdadm to create RAID arrays
