Comparing RAID 10 and RAID 01

These two RAID levels often bring about a tremendous amount of confusion, partially because they are incorrectly used interchangeably and often simply because they are poorly understood.

First, it should be pointed out that either maybe be written with or without the plus sign: RAID 10 is RAID 1+0 and RAID 01 is RAID 0+1. Strangely, RAID 10 is almost never written with the plus and RAID 01 is almost never written without. Storage engineers generally agree that the plus is never used as it is superfluous.

Both of these RAID levels are “compound” levels made from two different, simple RAID types being combined. Both are mirror-based, non-parity compound or nested RAID. Both have essentially identical performance characteristics – nominal overhead and latency with NX read speed and (NX)/2 write speed where N is the number of drives in the array and X is the performance of an individual drive in the array.

What sets the two RAID levels apart is how they handle disk failure. The quick overview is that RAID 10 is extremely safe under nearly all reasonable scenarios. RAID 01, however, rapidly becomes quite risky as the size of the array increases.

In a RAID 10, the loss of any single drive results in the degradation of a single RAID 1 set inside of the RAID 0 stripe. The stripe level sees no degradation, only the one singular RAID 1 mirror does. All other mirrors are unaffected. This means that our only increased risk is that the one single drive is now running without redundancy and has no protection. All other mirrored sets still retain full protection. So our exposure is a single, unprotected drive – much like you would expect in a desktop machine.

Array repair in a degraded RAID 10 is the fastest possible repair scenario. Upon replacing a failed drive, all that happens is that that single mirror is rebuilt – which is a simple copy operation that happens at the RAID 1 level, beneath the RAID 0 stripe. This means that if the overall array is idle the mirroring process can proceed at full speed and the overall array has no idea that this is even happening. A disk to disk mirror is extremely fast, efficient and reliable. This is an ideal recovery scenario. Even if multiple mirrors have degradation simultaneously and are repairing simultaneously there is no additional impact as the rebuilding of one does not impact others. RAID 10 risk and repair impact both scale extremely well.

RAID 01, on the other hand, when it loses a single drive immediately loses an entire RAID 0 stripe. In a typical RAID 01 mirror there are two RAID 0 stripes. This means that half of the entire array has failed. If we are talking about an eight drive RAID 01 array, the failure of a single drive renders four drives instantly inoperable and effectively failed (hardware does not need to be replaced but the data on the drives is out of date and must be rebuilt to be useful.) So from a risk perspective, we can look at it as being a failure of the entire stripe.

What is left after a single disk has failed is nothing but a single, unprotected RAID 0 stripe. This is far more dangerous than the equivalent RAID 10 failure because instead of there being only a single, isolated hard drive at risk there is now a minimum of two disks and potentially many more at risk and each drive exposed to this risk magnifies the risk considerably.

As an example, in the smallest possible RAID 10 or 01 array we have four drives. In RAID 10 if one drive fails, our risk is that its matching partner also fails before we rebuild the array. We are only worried about that one drive, all other drives in the RAID 10 set are still protected and safe. Only this one is of concern. In a RAID 01, when the first drive fails its partner in its RAID 0 set is instantly useless and effectively failed as it is no longer operable in the array. What remains are two drives with no protection running nothing but RAID 0 and so we have the same risk that RAID 10 did, twice. Each drive has the same risk that the one drive did before. This makes our risk, in the best case scenario, much higher.

But for a more dramatic example let us look at a large twenty-four drive RAID 10 and RAID 01 array. Again with RAID 10, if one drive fails all others, except for its one partner, are still protected. The extra size of the array added almost zero additional risk. We still only fear for the failure of that one solitary drive. Contrast that to RAID 01 which would have had one of its RAID 0 arrays fail taking twelve disks out at once with the failure of one leaving the other twelve disks in a RAID 0 without any form of protection. The chances of one of twelve drives failing is significantly higher than the chances of a single drive failing, obviously.

This is not the entire picture. The recovery of the single RAID 10 disk is fast, it is a straight copy operation from one drive to the other. It uses minimal resources and takes only as long as is required for a single drive to read and to write itself in its entirety. RAID 01 is not as lucky. Unlike RAID 10 which rebuilds only a small subset of the entire array, and a subset that does not grow as the array grows – the time to recover a four drive RAID 10 or a forty drive RAID 10 after failure is identical, RAID 01 must rebuild an entire half of the whole parents array. In the case of the four drive array, this is double the rebuild work of the RAID 10 but in the case of the twenty four drive array it is twelve times the rebuild work to be done. So RAID 01 rebuilds take longer to perform while being under significantly more risk during that time.

There is a rather persistent myth that RAID 01 and RAID 10 have different performance characteristics, but they do not. Both use plain striping and mirroring which are effectively zero overhead operations that requires almost no processing overhead. Both get full read performance from every disk device attached to them and each lose half of their write performance to their mirroring operation (assuming two way mirrors which is the only common use of either array type.) There is simply nothing to make RAID 01 or RAID 10 any faster or slower than the other. Both are extremely fast.

Because of the characteristics of the two array types, it is clear that RAID 10 is the only type, of the two, that should ever exist within a single array controller. RAID 01 is unnecessarily dangerous and carries no advantages. They use the same capacity overhead, they have the same performance, they cost the same to implement, but RAID 10 is significantly more reliable.

So why does RAID 01 even exist? Partially it exists out of ignorance or confusion. Many people, implementing their own compound RAID arrays, choose RAID 01 because they have heard the myth that it is faster and, as is generally the case with RAID, do not investigate why it would be faster and forget to look into its reliability and other factors. RAID 01 is truly only implemented on local arrays by mistake.

However, when we take RAID to the network layer, there are new factors to consider and RAID 01 can become important, as can its rare cousin RAID 61. We denote, via Network RAID Notation, where the local and where the network layers of the RAID exist. So in this case we mean RAID 0(1) OR RAID 6(1). The parentheses denote that the RAID 1 mirror, the “highest” portion of the RAID stack, is over a network connection and not on the local RAID controller.

How would this look in RAID 0(1)? If you have two servers, each with a standard RAID 0 array and you want them to be synchronized together to act as a single, reliable array you could use a technology such as DRBD (on Linux) or HAST (on FreeBSD) to create a network RAID 1 array out of the local storage on each server. Obviously this has a lot of performance overhead as the RAID 1 array must be kept in sync over the high latency, low bandwidth LAN connection. RAID 0(1) is the notation for this setup. If each local RAID 0 array was replaced with a more reliable RAID 6 we would write the whole setup as RAID 6(1).

Why do we accept the risk of RAID 01 when it is over a network and not when it is local? This is because of the nature of the network link. In the case of RAID 10, we rely on the low level RAID 1 portion of the RAID stack for protection and the RAID 0 sits on top. If we replicate this on a network level such as RAID 1(0) what we end up with is each host having a single mirror representing only a portion of the data of the array. If anything were to happen to any node in the array or if the network connection was to fail the array would be instantly destroyed and each node would be left with useless, incomplete data. It is the nature of the high risk of node failure and risk at the network connection level that makes RAID decisions in a network setting extremely different. This becomes a complex subject on its own.

Suffice it to say, when working with normal RAID array controllers or with local storage and software RAID, utilize RAID 10 exclusively and never RAID 01.

6 thoughts on “Comparing RAID 10 and RAID 01”

  1. That was a very interesting read, SAM. Thanks for taking the time to spell this one out. MY head is hurting a little from it, and I’ll probably have to re-read it. However, I see the general ideas that you are communicating.

  2. Great article! I’ve enjoyed all your articles on RAID levels and I hope there will be more to come. Thanks!

  3. Scott,

    Interesting read. So, if I read it correctly, is it almost a primary and secondary RAID, and the first number (1 in 10 v. 0 in 01) is primary and the second is secondary, which is why 01 is so much riskier, as opposed to 10, where RAID1, or redundancy, is primary. I know that might not be the most perfect way to explain it but that’s what I got out of it.

    Thanks for the read!

  4. I’m not sure that I would think of them as primary and secondary. One is simply nearer to the block device and the other nearer to the final controller. Which is primary, the peek of a pyramid or its base? Each is primary depending on perception.

    But how you are thinking is correct, if all other things are equal you want your reliability as low in the stack as possible. At least when dealing with blind reliability as in replicated block devices.

  5. It’s a pity you don’t have a donate button! I’d most certainly donate to this brilliant blog!
    I guess for now i’ll settle for book-marking and adding your RSS feed to my Google
    account. I look forward to brand new updates and will share this website with my Facebook group.

    Chat soon!

Leave a Reply

Your email address will not be published. Required fields are marked *