Risk in a difficult concept and it requires a lot of training, thought and analysis to properly assess given scenarios.  Often, because risk assessments are so difficult, we substitute risk analysis with simply adding basic redundancy and assuming that we have appropriately mitigated risk.  But very often this is not the case.  The introduction of complexity or additional failure modes often accompany the addition of redundancy and these new forms of failure have the potential to add more risk than the added redundancy removes.  Storage systems are especially prone to these decision processes which is unfortunate as few, if any, systems are so susceptible to failure and more important to protect.

RAID is a great example of where a lack of holistic risk thinking can lead to some strange decision making.  If we look at a not uncommon scenario we will see where the goal of protecting against drive failure can actually lead to an increase in risk even when additional redundancy is applied.  In this scenario we will compare a twelve drive array consisting of twelve three terabyte SATA hard drives in a single array.  It is not uncommon to hear of people choosing RAID 5 for this scenario to get “maximum capacity and performance” while having “adequate protection against failure.”

The idea here is that RAID 5 protects against the loss of a single drive which can be replaced and the array will rebuild itself before a second drive fails.  That is great in theory, but the real risks of an array of this size, thirty six terabytes of drive capacity, come not from multiple drive failures as people generally suspect but from an inability to reliably rebuild the array after a single drive failure or from a failure of the array itself with no individual drives failing.  The risk of a second drive failing is low, not non-existent, but quite low.  Drives today are highly reliable. Once one drives fails it does increase the likelihood of a second drive failing, which is well documented, but I don’t want this risk to mislead us from looking at the true risks – the risk of a failed resilvering operation.

What happens that scares us during a RAID 5 resilver operation is that an unrecoverable read error (URE) can occur.  When it does the resilver operation halts and the array is left in a useless state – all data on the array is lost.  On common SATA drives the rate of URE is 10^14, or once every twelve terabytes of read operations.  That means that a six terabyte array being resilvered has a roughly fifty percent chance of hitting a URE and failing.  Fifty percent chance of failure is insanely high.  Imagine if your car had a fifty percent chance of the wheels falling off every time that you drove it.  So with a small (by today’s standards) six terabyte RAID 5 array using 10^14 URE SATA drives, if we were to lose a single drive, we have only a fifty percent chance that the array will recover assuming the drive is replaced immediately.  That doesn’t include the risk of a second drive failing, only the risk of a URE failure.  It also assumes that the drive is completely idle other than the resilver operation.  If the drives are busily being used for other tasks at the same time then the chances of something bad happening, either a URE or a second drive failure, begin to increase dramatically.

With a twelve terabyte array the chances of complete data loss during a resilver operation begin to approach one hundred percent – meaning that RAID 5 has no functionality whatsoever in that case.  There is always a chance of survival, but it is very low.  At six terabytes you can compare a resilver operation to a game of Russian roulette with one bullet and six chambers and you have to pull the trigger three times.  With twelve terabytes you have to pull it six times!  Those are not good odds.

But we are not talking about a twelve terabyte array.  We are talking about a thirty six terabyte array – which sounds large but this is a size that someone could easily have at home today, let alone in a business.  Every major server manufacturer, as well as nearly all low cost storage vendors, make sub $10,000 storage systems in this capacity range today.  Resilvering a RAID 5 array with a single drive failure on a thirty six terabyte array is like playing Russian roulette, one bullet, six chambers and pulling the trigger eighteen times!  Your data doesn’t stand much of a chance.  Add to that the incredible amount of time needed to resilver an array of that size and the risk of a second disk failing during that resilver window starts to become a rather significant threat.  I’ve seen estimates of resilver times climbing into weeks or months on some systems.  That is a long time to run without being able to lose another drive.  When we are talking hours or days the risks are pretty low, but still present.  When we are talking weeks or months of continuous abuse, as resilver operations are extremely drive intensive, the failure rates climb dramatically.

With an array of this size we can effectively assume that the loss of a single drive means the loss of the complete array leaving us with no drive failure protection at all.  Now if we look at a drive of the same or better performance with the same or better capacity under RAID 0, which also has no protection against drive loss, we need only use eleven of the same drives that we needed twelve of for our RAID 5 array.  What this means is that instead of twelve hard drives, each of which has a roughly three percent chance of annual failure, we have only eleven.  That alone makes our RAID 0 array more reliable as there are fewer drives to fail.  Not only do we have fewer drives but there is no need to write the parity block nor skip parity blocks when reading back lowering, ever so slightly, the mechanical wear and tear on the RAID 0 array for the same utilization giving it a very slight additional reliability edge.  The RAID 0 array of eleven drives will be identical in capacity to the twelve drive RAID 5 array but will have slightly better throughput and latency.  A win all around.  Plus the cost savings of not needing an additional drive.

So what we see here is that in large arrays (large in capacity, not in spindle count) that RAID 0 actually passes RAID 5 in certain scenarios.  When using common SATA drives this happens at capacities experienced even by power users at home and by many small businesses.  If we move to enterprise SATA drives or SAS drives then the capacity number where this occurs becomes very high and is not a concern today but will be in just a few years when drive capacities get larger still.  But this highlights how dangerous RAID 5 is in the sizes that we see today.  Everyone understands the incredible risks of RAID 0 but it can be difficult to put into perspective that RAID 5′s issues are so extreme that it might actually be less reliable than RAID 0.

That RAID 5 might be less reliable than RAID 0 in an array of this size based on resilver operations alone is just the beginning.  In a massive array like this the resilver time can take so long and exact such a toll on the drives that second drive failure starts to become a measurable risk as well.  And then there are additional risks caused by array controller errors that can utilize resilver algorithms to destroy an entire array even when no drive failure has occurred.  As RAID 0 (or RAID 1 or RAID 10) do not have resilver algorithms they do not suffer this additional risk.  These are hard risks to quantify but what is important is that they are additional risks that accumulate when using a more complex system when a simpler system, without the redundancy, was more reliable from the outset.

Now that we have established that RAID 5 can be less reliable than RAID 0 I will point out the obvious dangers of RAID 0.  RAID in general is used to mitigate the risk of a single, lone hard drive failing.  We all fear a single drive simply failing and all data being lost.  RAID 0, being a large stripe of drives without any form of redundancy, takes the risk of data loss of a single drive failing and multiplies it across a number of drives where any drive failing causes total loss of data to all drives.  So in our eleven disk example above, if any of the eleven disks fails all is lost.  It is clear to see where this is dramatically more dangerous than just using a single drive, all alone.

What I am trying to point out here is that redundancy does not mean reliability.  Just because something is redundant, like RAID 5, provides no guarantee that it will always be more reliable than something that is not redundant.

My favourite analogy here is to look at houses in a tornado.  In one scenario we build a house of brick and mortar.  On the second scenario we build two redundant house, east out of straw (our builders are pigs, apparently.)  When the tornado (or big bad wolf) comes along which is more likely to leave us with a standing house?  Clearing one brick and mortar house has some significant reliability advantages over redundant straw houses.  Redundancy didn’t matter, reliability mattered in the end.

Redundancy is often misleading because it is easy to quantify but hard to qualify.  Redundancy is a black or white question: Is it redundant?  Yes or no.  Simple.  Reliability is not so simple.  Reliability is about failure rates and likelihoods.  It is about statistics and analysis.  As it is hard to quantify reliability in a meaningful way, especially when selling a project to the business people, redundancy often becomes a simple substitute for this complex concept.

The concept of using redundancy to misdirect questions of reliability also ends up applying to subsystems in very convoluted ways.  Instead of making a “system” redundant it has become common to make a highly reliable, and low cost, subsystem redundant and treat subsystem redundancy as applying to the whole system.  The most common example of this is RAID controllers in SAN products.  Rather than having a redundant SAN (meaning two SANs) manufacturers will often make that one component not often redundant in normal servers redundant  and then calling the SAN redundant – meaning a SAN that contains redundancy, which is not at all the same thing.

A good analogy here would be to compare having redundant cars meaning two complete, working cars and having a single car with a spare water pump in the trunk in case the main one fails.  Clearly, a spare water pump is not a bad thing.  But it is also a trivial amount of protection against car failure compared to having a second car ready to go.  In one case the entire system is redundant, including the chassis.  In the other we are making just one, highly reliable component redundant inside the chassis.  It’s not even on par with having a spare tire which, at least, is a car component with a higher likelihood of failure.

Just like the myth of RAID 5 reliability and system/subsystem reliability, shared storage technologies like SANs and NAS often get treated in the same way, especially in regards to virtualization.  There is a common scenario where a virtualization project is undertaken and people instinctively panic because a single virtualization host represents a single point of failure where, if it fails, many systems will all fail at once.

Using the term “single point of failure” causes a panic feeling and is a great means of steering a conversation.  But a SPOF, as we like to call it, while something we like to remove when possible may not be the end of the world.  Think about our brick house.  It is a SPOF.  Our two houses of straw are not.  Yet a single breeze takes out our redundant solutions faster than our reliable SPOF.  Looking for SPOFs is a great way to find points of fragility in a system, but do not feel that every SPOF must be made redundant in every scenario.  Most businesses will find their best value having many SPOFs in place.  Our real goal is reliability at appropriate cost, redundancy, as we have seen, is no substitute for reliability, it is simply a tool that we can use to achieve reliability.

The theory that many people follow when virtualizing is that they take their virtualization host and say “This host is a SPOF, so I need to have two of them and use High Availability features to allow for transparent failover!”  This is spurred by the leading virtualization vendor making their money firstly by selling expensive HA add on products and secondly by being owned by a large storage vendor – so selling unnecessary or even dangerous additional shared storage is a big monetary win for them and could easily be the reason that they have championed the virtualization space from the beginning.  Redundant virtualization hosts with shared storage sounds great but can be extremely misguided for several reasons.

The first reason is that removing the initial SPOF, the virtualization host, is replaced with a new SPOF, the shared storage.  This accomplishes nothing.  Assuming that we are using comparable quality servers and shared storage all we’ve done is move where the risk is, not change how big it is.  The likelihood of the storage system failing is roughly equal to the likelihood of the original server failing.  But in addition to shuffling the SPOF around like in a shell game we’ve also done something far, far worse – we have introduced chained or cascading failure dependencies.

In our original scenario we had a single server.  If the server stayed working we are good, if it failed we were not.  Simple.  Now we have two virtualization hosts, a single storage server (SAN, NAS, whatever) and a network connecting them together.  We have already determined that the risk of the shared storage failing is approximately equal to our total system risk in the original scenario.  But now we have the additional dependencies of the network and the two front end virtualization nodes.  Each of these components is more reliable than the fragile shared storage (anything with mechanical drives is going to be fragile) but that they are lower risk is not the issue, the issue is that the risks are combinatorial.

If any of these three components (storage, network or the front end nodes) fail then everything fails.  The solution to this is to make the shared storage redundant on its own and to make the network redundant on its own.  With enough work we can overcome the fragility and risk that we introduced by adding shared storage but the shared storage on its own is not a form of risk mitigation but is a risk itself which must be mitigated.  The spiral of complexity begins and the cost associated with bringing this new system up on par with the reliability of the original, single server system can be astronomic.

Now that we have all of this redundancy we have one more risk to worry about.  Managing all of this redundancy, all of these moving parts, requires a lot more knowledge, skill and preparation than does managing a simple, single server.  We have moved from a simple solution to a very complex one.  In my own anecdotal experience the real dangers of solutions like this come not from the hardware failing but from human error.  Not only has little been done to avoid human error causing this new system to fail but we’ve added countless points where a human might accidentally bring the entire system, redundancy and all, right down.  I’ve seen it first hand; I’ve heard the horror stories.  The more complex the system the more likely a human is going to accidentally break everything.

It is critical that as IT professionals that we step back and look at complete systems and consider reliability and risk and think of redundancy simply as a tool to use in the pursuit of reliability.  Redundancy itself is not a panacea.  Neither is simplicity.  Reliability is a complex problem to tackle.  Avoiding simplistic replacements is an important first step in moving from covering up reliability issues to facing and solving them.

 

Share

5 Responses to “When No Redundancy Is More Reliable – The Myth of Redundancy”

  1. David Hay-Currieon 29 Jun 2012 at 4:34 pm

    SAM, I enjoyed this article quite a lot. Good after readying the OpenStorage article.
    I have been gathering, readying a lot of information, and in the end (a couple of weeks ago) I finally decided that the best route was to use internal datastore in our server because of higher I/O available, and more reliable drives than our NAS, but replicate data to NAS, in case that there is a problem with the local store. Since I already have to VM Host (to separate resources consumption) this seemed more logical.
    The weird thing, is that I did not see much about doing this kind of deployment, and I talked with a couple of people that put a lot of emphasis into Vmotion and HA, however, they do not have large budgets either. I was really feeling lonely :)
    However, the challenge is still there to have this replication.
    I found this software
    http://www.stormagic.com/SvSAN.php
    That can present local datastore as a SAN (using iSCSI), but I am still wondering if there are other options out there

  2. Scott Alan Milleron 11 Jul 2012 at 2:09 pm

    Another good reference for RAID issues can be found here: http://queue.acm.org/detail.cfm?id=1670144

  3. Owenon 08 Aug 2012 at 11:53 am

    Fantastic write up. I saw first hand the exact scenario you laid out in the last part of your article, so I have seen the danger up jumping on the “Lets get a SAN” bandwagon.

  4. [...] [...]

  5. [...] [...]

Trackback URI | Comments RSS

Leave a Reply

  • Buy Cheap buy black cialis without prescription Online Drugs, Health And Beauty. Best Drugstore.
  • Buy Cheap levitra sales uk Now Pharmacy Store. Order Cheap Meds Without Rx.
  • Buy Cheapest viagra natural Online Cheap Prescription Drugs. Low Prices.
  • Buy Cheapest buy cialis jelly no prescription Online Buy Medications Online. Best Prices.
  • Buy Cheapest cialis without a prescription Now Special Prices For cialis without a prescription! Low Prices.
  • Buy Cheapest cialis generic levitra propecia viagra Now Special Prices For cialis generic levitra propecia viagra! Best Drugstore.
  • Buy Cheap tadalafil 20 mg best price Online Special Prices For tadalafil 20 mg best price! Best Drugstore.
  • Buy Cheap cialis daily pill Now Pharmacy Store. Top Online Pharmacy Supplier.
  • Buy Cheap sildenafil tablets dosage Online Cheap Online Pharmacy. Top Online Pharmacy.
  • Buy Cheapest dosage levitra Online Best Internet. Cheap Online Pharmacy.
  • Buy Cheapest how to buy viagra no prescription Online Cheap Online Pharmacy. Best Drugstore.
  • Buy Cheap is viagra effective Now No Prescription Needed. 24/Online Pharmacy.
  • Buy Cheap generic viagra us Now 24/Internet)(safe Pharmacy. Best Online.
  • Buy Cheap how much cialis can i take Online Cheap Online Pharmacy. Free Viagra Pills!
  • Buy Cheap viagra online pharmacy no prescription Now Buy Medications Online. Cheap Online Pharmacy.
  • Buy Cheap cialis viagra levitra Now Free Viagra Pills! Online Prices For cialis viagra levitra!
  • Buy Cheapest australia viagra online without prescription Online No Prescription Needed. Pharmacy Store.
  • Buy Cheapest online pharmacy generic cialis Now Best Online. Drugs, Health And Beauty.
  • Buy Cheap fda viagra Online Best Online. The Largest Internet Pharmacy.
  • Buying Cheap tadalafil cheapest online. Offshore Rx, Best Prices. Online Medical Shop.
  • Buy Cheap viagra and cialis together Online Best Internet. Pharmacy At The Best Price!
  • Find The Latest News And Information About cialis online without prescription-canada Pills Low Prices.
  • Buy Cheap cheap generic cialis 2010 Online Low Prices. The Largest Internet Pharmacy.
  • Buy Cheap is viagra over the counter Now Top Online Pharmacy Supplier. Best Prices.
  • Buy Cheap levitra sales Now Cheap Pharmacy Online. Buy Medications Online.
  • Buy Cheapest viagra and alcohol side effects Now Discount Pharmacy Online. Low Prices.
  • Buy Cheap levitra 10mg vs 20mg Now Cheap Pharmacy Online. Online Medical Shop.
  • Buying Cheap using levitra. Offshore Rx, Best Prices. Online Medical Shop.
  • Buy Cheapest how much is viagra cost Now Cheap Prescription Drugs. Best Prices.
  • Buy Cheapest viagra ad pictures Online Discount Pharmacy Online. Low Prices.
  • Buy Cheapest generic viagra us Now Best Online. The Largest Internet Pharmacy.
  • Buy Cheapest how to get cialis without a prescription Now Pharmacy At The Best Price! Best Internet.
  • Buy Cheapest ed study levitra Now Best Drugstore. Online Medical Shop.
  • Buy Cheapest professional cialis reviews Now Cheap Online Pharmacy. Best Drugstore.
  • Buy Cheapest purchase discount cialis online Online Free Viagra Pills! WorldWide Shipping.
  • Buy Cheap side effects viagra men Now Free Viagra Pills! Special Prices For side effects viagra men!
  • Buy Cheapest free sample viagra vs cialis Online Best Prices. Drugs, Health And Beauty.
  • Buy Cheap best price viagra online Online Best Prices. 24/Internet)(safe Pharmacy.
  • Buy Cheapest viagra or cialis forum Online Best Online. Drugs, Health And Beauty.
  • Buy Cheapest viagra online prescription free Now Top Online Pharmacy. Free Viagra Pills!
  • Buy Cheapest side effects generic viagra Now Low Prices. Special Prices For side effects generic viagra!
  • Buy Cheap sildenafil drug Online Pharmacy Store. Buy Medications Online.
  • Buy Cheap compare generic viagra prices Online Best Prices. 100% Satisfaction Guaranteed.
  • Buy Cheapest what is viagra pro Now Best Online. Discount Pharmacy Online.
  • Buy Cheapest tadalafil 20 mg reviews Online Online Medical Shop. Free Viagra Pills!
  • Buy Cheap ordering cialis online legal Now Safe And Secure Payment System. Low Prices.
  • Buy Cheapest how much is viagra cost Now Buy Medications Online. Free Viagra Pills!
  • Buy Cheapest super viagra review Online WorldWide Shipping. Free Viagra Pills!
  • Buy Cheapest how to order viagra Online Best Prices. Special Prices For how to order viagra!
  • Buy Cheap where to buy cialis over the counter Now No Prescription Needed. WorldWide Shipping.
  • Buy Cheap much cialis should take Online Guaranteed Shipping. 24/Online Pharmacy.
  • Buy Cheap side effects of viagra use Now Online Medical Shop. Online Prices For side effects of viagra use!
  • Buy Cheap generic cialis tablet Online Cheap Pharmacy Online. Pharmacy Store.
  • Buy Cheapest dose levitra Now Best Online. Drugs, Health And Beauty.
  • Buy Cheapest viagra cialis no prescription fast Now Pharmacy Store. Cheap Online Pharmacy.
  • Buy Cheap vardenafil fda approval Online No Prescription Needed. Best Online.
  • Buy Cheap generic levitra online reviews Now 24/Online Pharmacy. Top Online Pharmacy.
  • Buy Cheap viagra buy no prescription Now Order Cheap Meds Without Rx. Best Online.
  • Buy Cheap buy best generic levitra tablets without a prescription Now Internet Prices For buy best generic levitra tablets without a prescription! Pharmacy Store.
  • Buy viagra or levitra which is better Without Prescription Doctor. Pharmacy Store. Low Prices.
  • Buy Cheap viagra jelly buy online cheap Online Low Prices. Top Online Pharmacy Supplier.
  • Buy Cheapest buy without a prescription levitra professional Online Best Internet. Cheap Pharmacy Online.
  • Buy Cheapest how can i get viagra Now Best Internet. Buy Medications Online.
  • Buy buy viagra cialis levitra online prescription Online Without Prescription. Pharmacy At The Best Price!
  • Buy Cheap tadalafil cialis Online 24/Internet)(safe Pharmacy. Low Prices.
  • Buy Cheapest buy viagra professional without prescription Online Best Internet. 24/Online Pharmacy.
  • Buy Cheapest blue prescription pills Online Pharmacy At The Best Price! Best Prices.
  • Buy Cheap professional cialis reviews Now Free Viagra Pills! Cheap Pharmacy Online.
  • Buy Cheapest buy and purchase viagra online Online Low Prices. 24/Online Pharmacy.
  • Buy Cheapest purchase cialis without a prescription Now Top Online Pharmacy. WorldWide Shipping.
  • Buy Cheap how long does it take for viagra to work Now No Prescription Needed. Online Medical Shop.
  • Buy Cheapest what is levitra used for Online Get FDA Approved Prescription Medicines.
  • Buy Cheapest levitra reviews Now The Largest Internet Pharmacy. Best Prices.
  • Buy Cheap viagra trial Online Guaranteed Shipping. Cheap Online Pharmacy.
  • Buy Cheapest medicine cialis tablets Now Internet Prices For medicine cialis tablets! Low Prices.
  • Buy Cheap does viagra work reviews Now Cheap Online Pharmacy. No Prescription Needed.
  • Buy Cheap soft pill cialis Now Order Cheap Meds Without Rx. Best Online.
  • Buy Cheapest best place to buy cheap viagra canada Now Best Prices. Discount Pharmacy Online.
  • sildenafil citrate pills Online Without Prescription Free Viagra Pills! Best Prices.
  • Buy Cheap cialis buy online no prescription Online Best Online. 100% Satisfaction Guaranteed.
  • Buy Cheapest ed study levitra Online Best Prices. Discount Online Pharmacy.
  • Buy Cheap cialis levitra Online Low Prices. Drugs, Health And Beauty.
  • Buy Cheapest viagra how long does it last Now 24/Online Pharmacy. Free Viagra Pills!
  • Buy Cheap sildenafil order Now Online Medical Shop. Cheap Prescription Drugs.
  • Buy Cheapest viagra au Now Cheap Online Pharmacy. WorldWide Shipping.
  • Buy Cheap levitra professional online canada no prescription discounts Now Top Online Pharmacy. Cheap Pharmacy Online.
  • Buy Cheapest how long does cialis last Now Free Viagra Pills! WorldWide Shipping.
  • Buy Cheapest levitra on line ordering 50mg Now Drugs, Health And Beauty. Best Online.
  • Buy Cheapest buying cialis online guide Online Best Drugstore. WorldWide Shipping.
  • tadalafil pharmacy Online Without Prescription Low Prices. Free Viagra Pills!
  • Buy Cheap sildenafil 100mg dosage Now Top Online Pharmacy. Online Medical Shop.
  • Buy Cheapest vardenafil fda approval Now Best Internet. Cheap Pharmacy Online.
  • Buy Cheapest levitra effect Now Top Online Pharmacy. Cheap Online Pharmacy.
  • Buy Cheap uk alternative viagra Now Free Viagra Pills! Top Online Pharmacy.
  • Buy Cheapest best alternative viagra Online WorldWide Shipping. Online Medical Shop.
  • Buy Cheap cialis doseage Online Low Prices. 24/Internet)(safe Pharmacy.
  • Buy Cheapest drug dosage levitra Now Best Internet. Cheap Online Pharmacy.
  • cialis au Online Without Prescription Low Prices. Guaranteed Shipping.
  • Buy Cheapest cialis generic levitra propecia viagra Online Best Online. Discount Online Pharmacy.
  • Buy vardenafil fda approval Without Prescription Doctor. Best Drugstore. Low Prices.