What is RAID 100?

RAID 10 is one of the most important and commonly used RAID levels in use today. RAID 10 is, of course, what is known as compound or nested RAID where one RAID level is nested within another. In the case of RAID 10, the “lowest” level of RAID, the one touching the physical drives, is RAID 1. The nomenclature of nested RAID is that the number to the left is the one touching the physical drives and each number to the right is the RAID that touches those arrays.

So RAID 10 is a number of RAID 1 (mirror) sets that are in a RAID 0 (non-parity stripe) set together. There is a certain common terminology sometimes applied, principally championed by HP, to refer to even RAID 1 as simply being a subset of RAID 10 – a RAID 10 array where the RAID 0 length is one. A quirky way to think of RAID 1, to be sure, but it actually makes many discussions and comparative calculations easier and makes sense in a practical way for most storage practitioners. Thinking of RAID 1 as a “special name” for the smallest possible RAID 10 stripe size and allowing, then, all RAID 10 permutations to exist as a calculation continuum makes sense.

Likewise, HP also refers to solitary drives attached to a RAID controller as RAID 0 sets of a stripe of one as well. So the application of that terminology to the RAID 10 world is actually more obvious and sensible when it is looked at in that light. However neither HP nor any other vendor today applies this same naming oddity to other array types such as RAID 5 being a subset of RAID 50 or RAID 6 being a subset of RAID 60 even though they can be thought of that way exactly the same as RAID 1 can be to RAID 10.

If we take that same logic and take it to the next level, figuratively and literally, we can take multiple RAID 10 arrays and stripe them together in another RAID 0. This seems odd but can make sense. The result is a stripe of RAID 10s or, to write it out, a stripe of stripes of mirrors (we generally state RAID from the top down but the nomenclature is from the bottom up.) So as this is RAID 1 on the physical drives, a stripe of those mirrors and then a stripe of those resultant arrays we get RAID 100 (R100.)

RAID 100 is, of course, rare and odd. However one extremely important RAID controller manufacturer utilizes R100 and, subsequently, so does their downstream integration vendor: namely LSI and Dell.
Fortunately because non-parity stripes inject little behavioral oddities and have near zero overhead or latency this approach is really not a problem although it can lead to a great deal of confusion. For all intents and purposes, RAID 100 behaves exactly like RAID 10 when each RAID 10 subset is identical to each other.

In theory, a RAID 100 could be made up of many disparate RAID 10 sets of varying drive types, spindle counts and speeds. In theory a RAID 10 could be made of up disparate RAID 1 sets but this is far more limited in potential or likely variation. RAID 100 could, theoretically, do some pretty bizarre things if left unchecked. In practicality, though, any RAID 100 implementation will likely, as does LSI’s implementation, enforce standardization and require that each RAID 10 subset be as identical as a controller is capable of enforcing. So each will be effectively uniform keeping the overall behavior to be the same as if the same drives were set up as RAID 10.

Because the behavior remains identical to RAID 10 there is an extremely strong tendency to avoid the confusion of calling the array RAID 100 and simply referring to it as RAID 10. This would work fine except for the semi-necessary quirk of needing to be able to specify the geometry of the underlying RAID 10 sets when building a RAID 100. LSI, and therefore Dell, requires that at the time of setting up a RAID 100 set that you must specify the underlying RAID 10 geometry but since the array is labeled as RAID 10, this makes no sense. A bizarre situation indeed.
To further complicate matters, because of the desire to maintain a façade of using RAID 10 rather than RAID 100, proper terminology is eschewed and instead of referring to the underlying RAID 10 members as “RAID 10 arrays” or “RAID 10 subsets” they are simply called “spans.” Span, however, being a term used for something else in storage that doesn’t apply properly here. Span, in no way, is a proper description for a RAID 10 set under any condition.

But if we agree to use the term span to refer to a RAID 10 subset of a RAID 100 array we can move forward pretty easily. Whenever possible, then, we want as many spans as possible to keep the underlying RAID 10 subsets as small as possible. If we make them small enough they actually collapse into RAID 1 sets (HP’s odd RAID 10 with a stripe size of one) and our RAID 100 collapses into a RAID 10 with the middle stripe, rather than the outside stripe, being the one that disappears! Bizarre, yes, but practical.

So how do we apply this in real life? Quite easily. In a RAID 100 array we must specify a count of spans to be used. Since we desire that each span contain two physical drive devices so that each span is a simple RAID 1 we simply need to take the total number of drives in our RAID 100 array, which we will call N, and divide that by two. So the desired span count for a normal RAID 100 array is simply N/2. This means if you have a two drive array, you want one span. Four drives, two spans. Six drives, three spans. Twenty four drives, twelve spans. And so on.

Do not be afraid of RAID 100. For normal users it simply requires some additional knowledge of how to select the proper number of spans. It would be ideal if this was calculated automatically and kept hidden allowing end users to think of the arrays in terms of RAID 10. Or else be labeled consistently as RAID 100 to make it clear what the span must represent. Or, of course, simply use RAID 10 instead of RAID 100. But given the practical state of reality, dealing with RAID 100, once it is understood, is easy.

Comparing RAID 10 and RAID 01

These two RAID levels often bring about a tremendous amount of confusion, partially because they are incorrectly used interchangeably and often simply because they are poorly understood.

First, it should be pointed out that either maybe be written with or without the plus sign: RAID 10 is RAID 1+0 and RAID 01 is RAID 0+1. Strangely, RAID 10 is almost never written with the plus and RAID 01 is almost never written without. Storage engineers generally agree that the plus is never used as it is superfluous.

Both of these RAID levels are “compound” levels made from two different, simple RAID types being combined. Both are mirror-based, non-parity compound or nested RAID. Both have essentially identical performance characteristics – nominal overhead and latency with NX read speed and (NX)/2 write speed where N is the number of drives in the array and X is the performance of an individual drive in the array.

What sets the two RAID levels apart is how they handle disk failure. The quick overview is that RAID 10 is extremely safe under nearly all reasonable scenarios. RAID 01, however, rapidly becomes quite risky as the size of the array increases.

In a RAID 10, the loss of any single drive results in the degradation of a single RAID 1 set inside of the RAID 0 stripe. The stripe level sees no degradation, only the one singular RAID 1 mirror does. All other mirrors are unaffected. This means that our only increased risk is that the one single drive is now running without redundancy and has no protection. All other mirrored sets still retain full protection. So our exposure is a single, unprotected drive – much like you would expect in a desktop machine.

Array repair in a degraded RAID 10 is the fastest possible repair scenario. Upon replacing a failed drive, all that happens is that that single mirror is rebuilt – which is a simple copy operation that happens at the RAID 1 level, beneath the RAID 0 stripe. This means that if the overall array is idle the mirroring process can proceed at full speed and the overall array has no idea that this is even happening. A disk to disk mirror is extremely fast, efficient and reliable. This is an ideal recovery scenario. Even if multiple mirrors have degradation simultaneously and are repairing simultaneously there is no additional impact as the rebuilding of one does not impact others. RAID 10 risk and repair impact both scale extremely well.

RAID 01, on the other hand, when it loses a single drive immediately loses an entire RAID 0 stripe. In a typical RAID 01 mirror there are two RAID 0 stripes. This means that half of the entire array has failed. If we are talking about an eight drive RAID 01 array, the failure of a single drive renders four drives instantly inoperable and effectively failed (hardware does not need to be replaced but the data on the drives is out of date and must be rebuilt to be useful.) So from a risk perspective, we can look at it as being a failure of the entire stripe.

What is left after a single disk has failed is nothing but a single, unprotected RAID 0 stripe. This is far more dangerous than the equivalent RAID 10 failure because instead of there being only a single, isolated hard drive at risk there is now a minimum of two disks and potentially many more at risk and each drive exposed to this risk magnifies the risk considerably.

As an example, in the smallest possible RAID 10 or 01 array we have four drives. In RAID 10 if one drive fails, our risk is that its matching partner also fails before we rebuild the array. We are only worried about that one drive, all other drives in the RAID 10 set are still protected and safe. Only this one is of concern. In a RAID 01, when the first drive fails its partner in its RAID 0 set is instantly useless and effectively failed as it is no longer operable in the array. What remains are two drives with no protection running nothing but RAID 0 and so we have the same risk that RAID 10 did, twice. Each drive has the same risk that the one drive did before. This makes our risk, in the best case scenario, much higher.

But for a more dramatic example let us look at a large twenty-four drive RAID 10 and RAID 01 array. Again with RAID 10, if one drive fails all others, except for its one partner, are still protected. The extra size of the array added almost zero additional risk. We still only fear for the failure of that one solitary drive. Contrast that to RAID 01 which would have had one of its RAID 0 arrays fail taking twelve disks out at once with the failure of one leaving the other twelve disks in a RAID 0 without any form of protection. The chances of one of twelve drives failing is significantly higher than the chances of a single drive failing, obviously.

This is not the entire picture. The recovery of the single RAID 10 disk is fast, it is a straight copy operation from one drive to the other. It uses minimal resources and takes only as long as is required for a single drive to read and to write itself in its entirety. RAID 01 is not as lucky. Unlike RAID 10 which rebuilds only a small subset of the entire array, and a subset that does not grow as the array grows – the time to recover a four drive RAID 10 or a forty drive RAID 10 after failure is identical, RAID 01 must rebuild an entire half of the whole parents array. In the case of the four drive array, this is double the rebuild work of the RAID 10 but in the case of the twenty four drive array it is twelve times the rebuild work to be done. So RAID 01 rebuilds take longer to perform while being under significantly more risk during that time.

There is a rather persistent myth that RAID 01 and RAID 10 have different performance characteristics, but they do not. Both use plain striping and mirroring which are effectively zero overhead operations that requires almost no processing overhead. Both get full read performance from every disk device attached to them and each lose half of their write performance to their mirroring operation (assuming two way mirrors which is the only common use of either array type.) There is simply nothing to make RAID 01 or RAID 10 any faster or slower than the other. Both are extremely fast.

Because of the characteristics of the two array types, it is clear that RAID 10 is the only type, of the two, that should ever exist within a single array controller. RAID 01 is unnecessarily dangerous and carries no advantages. They use the same capacity overhead, they have the same performance, they cost the same to implement, but RAID 10 is significantly more reliable.

So why does RAID 01 even exist? Partially it exists out of ignorance or confusion. Many people, implementing their own compound RAID arrays, choose RAID 01 because they have heard the myth that it is faster and, as is generally the case with RAID, do not investigate why it would be faster and forget to look into its reliability and other factors. RAID 01 is truly only implemented on local arrays by mistake.

However, when we take RAID to the network layer, there are new factors to consider and RAID 01 can become important, as can its rare cousin RAID 61. We denote, via Network RAID Notation, where the local and where the network layers of the RAID exist. So in this case we mean RAID 0(1) OR RAID 6(1). The parentheses denote that the RAID 1 mirror, the “highest” portion of the RAID stack, is over a network connection and not on the local RAID controller.

How would this look in RAID 0(1)? If you have two servers, each with a standard RAID 0 array and you want them to be synchronized together to act as a single, reliable array you could use a technology such as DRBD (on Linux) or HAST (on FreeBSD) to create a network RAID 1 array out of the local storage on each server. Obviously this has a lot of performance overhead as the RAID 1 array must be kept in sync over the high latency, low bandwidth LAN connection. RAID 0(1) is the notation for this setup. If each local RAID 0 array was replaced with a more reliable RAID 6 we would write the whole setup as RAID 6(1).
Why do we accept the risk of RAID 01 when it is over a network and not when it is local? This is because of the nature of the network link. In the case of RAID 10, we rely on the low level RAID 1 portion of the RAID stack for protection and the RAID 0 sits on top. If we replicate this on a network level such as RAID 1(0) what we end up with is each host having a single mirror representing only a portion of the data of the array. If anything were to happen to any node in the array or if the network connection was to fail the array would be instantly destroyed and each node would be left with useless, incomplete data. It is the nature of the high risk of node failure and risk at the network connection level that makes RAID decisions in a network setting extremely different. This becomes a complex subject on its own.

Suffice it to say, when working with normal RAID array controllers or with local storage and software RAID, utilize RAID 10 exclusively and never RAID 01.

It Worked For Me

“Well, it worked for me.”  This has become a phrase that I have heard over and over again in defense of what would logically be otherwise considered a bad idea.  These words are often spoken innocently enough without deep intent, but they often cover deep meaning that should be explored.

But it is important to understand what drives these words both psychologically as well as technically.  At a high level, what we have is the delivery of an anecdote which can be restated as such: “While the approach or selection that I have used goes against your recommendation or best practices or what have you, in my particular case the bad situation of which you have warned or advised against has not arisen and therefore I believe that I am justified in the decision that I have made.”

I will call this the “Anecdotal Dismissal of Risk” or better known as “Outcome Bias.”  Generally this phrase is used to wave off the accusation that one has either taken on unnecessary risk or taken on unnecessary financial expense or, more likely, both.  The use of an anecdote for either of these cases is, of course, completely meaningless but the speaker does so with the hope of throwing off the discussion and routing it around their case by suggesting, without saying it, that perhaps they are a special case that has not been considered or, perhaps, that “getting lucky” is a valid form of decision making.

Of course, when talking risk, we are talking about statistical risk.  If anything was a sure thing, and could be proven or disproved with an anecdote, it would not be risk but would just be a known outcome and making the wrong choice would be amazingly silly.  Anecdotes have a tiny place when using in the negative, for example: They claim that it is a billion to one chance that this would happen, but it happened to me on the third try and I know one other person that it happened to.  That’s not proof, but anecdotally it suggests that the risk figures are unlikely correct.

That case is valid, still incredibly important to realize that even negative anecdotal evidence (anecdotal evidence of something that was extremely unlikely to happen) is still anecdotal and does not suggest that the results will happen again, but at least it suggests that you were an amazing edge case.  If you know of one person that has won the lottery, that’s unlikely but doesn’t prove that the lottery is likely to be won.  If you know that every other person you know who has played the lottery has won, something is wrong with the statistics.

However, the “it worked for me” case is universally used with risk that is less than fifty percent (if it were not the whole thing would become crazy.)  Often it is about taking something four nines reliability and reducing it to three nines when attempting to raise it.  Three nines of something still means that there is only a one in one thousand chance that the bad case will arise.  This isn’t statistically likely to occur, obviously.  At least we would hope that it was obvious.  Even though, in this example, the bad case arises ten times more often than it would have it we had left well enough alone and maybe one hundred times more than how often we intended for it to arise we still expect to never see the bad outcome unless we run thousands or tens of thousands of cases and then the statistics are still based on a rather small pool.

In many cases we talk about an assumption of unnecessary risk but generally this is risk at a financial cost. What prompts this reaction a great deal of the time, in my experience, is a reaction to being demonstrated a dramatic overspending – implementing very costly solutions when a less costly one, often fractionally as expensive, may approach or, in many cases, exceed the chosen solution that is being defended.

To take the reverse, out of any one thousand people, nine hundred and ninety nine of them, doing this same thing, would be expected to have no bad outcome.  For someone to claim, then, that the risk is one part in one thousand and have one of the nine hundred and ninety nine step forward and say “the risk can’t exist because I am not the incredibly unlikely one to have had the bad thing happen to me” obviously makes no sense whatsoever when looking at the pool as a whole.  But when we are the ones who made the decision to join that pool and then came away unscathed it is an apparently natural reaction to discount the assumed outcome of even a risky choice and assume that the risk did not exist.

It is difficult to explain risk in this way but, over the years, I’ve found a really handy example to use that tends to explain business or technical risk in a way that anyone can understand.  I call it the Mother Seatbelt Example.  Try this experiment (don’t actually try it but lie to your mother and tell her that you did to see the outcome.)

Drive a car without wearing a seatbelt for a whole day while continuously speeding.  Chances are extremely good that nothing bad will happen to you (other than paying some fines.)  The chances of having a car accident and getting hurt, even while being reckless in both your driving and disregarding basic safety precautions, is extremely low.  Easily less than one in one thousand.   Now, go tell your mother what you just did and say that you feel that doing this was a smart way to drive and that you made a good decision in having done so because “it worked out for me.”  Your mother will make it very clear to you what risky decisions mean and how anecdotal evidence of expected survival outcome does not indicate good risk / reward decision making.

In many cases, “it worked for me” is an attempt at deflection.  A reaction of our amygdala in a “fight or flight” response to avoid facing what is likely a bad decision of the past.  Everyone has this reaction, it is natural, but unhealthy.  By taking this stance of avoiding critical evaluation of past decisions we make ourselves more likely to continue to repeat the same bad decision or, at the very least, continue the bad decision making process that lead to that decision.  It is only by facing critical examination and accepting that past decisions may not have been ideal that we can examine ourselves and our processes and attempt to improve them to avoid making the same mistakes again.

It is understandable that in any professional venue there is a desire to save face and appear to have made if not a good decision, at least an acceptable one and so the desire to explore logic that might undermine that impression is low.  Even moreso there is a very strong possibility that someone who is a potential recipient of the risk or cost that the bad decision created will learn of the past decision making and there is, quite often, an even stronger desire to cover up any possibility that a decision may have been made without proper exploration or due diligence.  These are understandable reactions but they are not healthy and ultimately make the decision look even poorer than it would have.  Everyone makes mistakes, everyone.  Everyone overlooks things, everyone learns new things over time.  In some cases, new evidence comes to light that was impossible to have known at the time.  There should be no shame in past decisions that are less than ideal, only in failing to examine them and learn from them allowing us as individuals as well as our organizations to grow and improve.

The phrase seems innocuous enough when said.  It sounds like a statement of success.  But we need to reflect deeper.  The risk scenario we showed above.  But what about the financial one.  When a solution is selected that carries little or no benefits, and possibly great caveats as we see in many real world cases, while being much more costly and the term “it worked for me” is used, what is really being said is “wasting money didn’t get me in trouble.”  When used in the context of a business, this is quite a statement to make.  Businesses exist to make money.  Wasting money on solutions that don’t meet the need better is a failure whether the solution functions technically or not.  Many solutions are too expensive but would not fail, choosing the right solution always involves getting the right price for the resultant situation.  That is just the nature of IT in business.

Using this phrase can sound reasonable to the irrational, defense brain.  But to outsiders looking in with rational views it actually sounds like “well, I got away with…” fill in the blank: “wasting money”, “being risky”, “not doing my due diligence”, “not doing my job”, or whatever the case may be.  And likely whatever you think should be filled in there will not be as bad as what others assume.

If you are attempted to justify past actions by saying “it worked for me” or by providing anecdotal evidence that shows nothing, stop and think carefully.  Give yourself time to calm down and evaluate your response.  Is is based on logic or irrational amygdala emotions?  Don’t be ashamed of having the reaction, everyone has it.  It cannot be escaped.  But learning how to deal with it can allow us to approach criticism and critique with an eye towards improvement rather than defense.  If we are defensive, we lose the value in peer review, which is so important to what we do as IT professionals.

Doing IT at Home: Enterprise Networking

In my fifth installment in the continuing series on Doing IT at Home I would like to focus on enterprise networking.  Many ways in which we can bring business class IT into our homes can really be done for free but networking is sadly, not one of those areas but it does not have to be as costly as you may, at first, think and having a good, solid, enterprise-class home network can bring many features that other IT at Home projects do not.

Implementing an real, working business class network at home lays the foundation for a lot of potential learning, experimenting, testing and growth; and compared to other, small and less ambitious projects this one will likely shine very brightly on a curriculum vitae.

Now we have to start by defining what we mean by “enterprise networking.”  Clearly the needs and opportunity for networking at home are not the same as they are in a real business, especially not a large one – at least without resorting to a pure lab setup which is not our goal of bringing IT home.  Having a lab at home is excellent and I highly recommend it, but I would not recommend building to “true lab” in your home until you have truly taken advantage of the far better opportunity to treat your home as a production “living lab” environment.  Needing your “living lab” to be up and running, in use every day changes how you view it, how you treat it and what you will take away from the experience.  A pure lab can be very abstract and it is easy to treat it in such a way that much of the educational opportunity is lost.

There are many aspects of enterprise networking that make sense to apply to our homes.  Every home is different and I will only present some ideas and I would love to hear what others can come up with as interesting ways to take home networking to the next level.

Firewall or Unified Thread Management (UTM):  This is the obvious starting point for upgrading any home network.  Most homes use a free multi-purpose device that is provided by their ISP that lacks features and security.  The firewall is the most featureful networking device that you will use in a home or in a small business and is the most important for providing basic security.  Your firewall provides the foundation of your home or small business network so getting this in place first makes sense.

There are numerous firewall and UTM products on the market.  Even for home or SMB use you will be flush with options.  You can only practically use a single unit and you need one powerful enough to be able to handle the throughput of a consumer WAN connection which may be a challenge with some vendors as consumer Internet access is getting very fast and requires quite a bit of processing power, especially from a UTM solution.

Choosing a firewall will likely mostly come down to your career goals and and price.  If you hope to pursue a career or certification in Cisco, Juniper or Palo Alto, for example, you will want to get devices from those vendors that allow you to do training at home.  These will be very expensive options but if that career path is your chosen one, having that gear at home will be immensely valuable not only for your learning and testing but for interviewing as well.

If you don’t have specific security or networking career goals your options are more open. There are traditional small business firewall suppliers like Netgear ProSafe that are low cost and easy to manage.  UTM devices are starting to enter this market like Netgear ProSecure, but these are almost universally more costly.  There is the software-only approach where you provide your own hardware and build the firewall yourself.  This is very popular and has many good options for software including pfSense, SmoothWall, Untangle and VyOS.  These vary in features and complexity.  For most cases, however, I would recommend a Ubiquiti EdgeMax router which runs Brocade Vyatta firmware.  These are less costly than most UTMs and run enterprise routing and firewall firmware – outside of needing a specific vendor’s product for networking career goals, this is the best learning, security and feature value on the market and will allow learning nearly any firewall or router skills outside of those specific to proprietary vendors.

When starting down the path of enterprise networking at home, remember to consider if you should also begin acquiring rackmount gear rather than tabletop equipment.  Having a racks, possibly just a half or even a quarter rack or cabinet, at home can make doing home enterprise projects much easier and can make the setup much more attractive, in many cases and is, like many of these projects, just that much more impressive.  Consider this when buying gear because firewalls in this category, are the hardest to find in rackmount configurations, much to my chagrin.

Switch: It has become common in home networking to begin forgoing a physical switch in favor of pure wireless solutions and for many homes where networking is not core to function this may make perfect sense.  But for us, it likely does not.  Adding switching makes for more learning opportunities, a better showcase, far more flexibility, faster data transfers inside of the house, a great number of connections and better reliability.  For a normal home user with few devices, most of which are mobile ones, this would be a waste, but for an IT pro at home, a real switch is practically a necessity.

Switching come in three key varieties.  Unmanaged, or dumb switches, which is all that you would find in a home or most small businesses.  Basic connectivity but nothing more.  This might be all that you need if you do not intend to explore deeper learning opportunities in networking.

Smart switching is a step up from an unmanaged switch.  A smart switch is often very low cost but adds additional features, normally through a web interface, that allow you to actively manage the switch, change configurations, troubleshoot, great VLANs and QoS, monitor, etc.  For someone looking to step up their at home network and approach networking from a higher-end small business perspective this is a great option and a very practical one for a home.

Managed switches are the most enterprise and by far the most costly.  These use SNMP and other standard protocols for remote management and monitoring and generally have the most features although often Smart switches have just as many.  Managed switches are not practical in a home for any reason as their benefits are around scalability, not features, but, like with everything, if learning those features is a key goal then this is another place where spending more money not only for those features but also to get “name brand” switches, like Cisco, Juniper, Brocade or HP can be an important investment.  But if the goal is only to learn the tools and standards of managed switches and not to go down the path of learning a specific implementation then lower cost options like Netgear Prosafe might make sense.

Once we decide on unmanaged, smart or managed switches then we have to decide on the “layer” of the switch.  This also has three options: Layer 2, Layer 2+ and Layer 3.  For home and small business use, L2 switches are the most common.  I have never seen more than an L2 in a home and rarely in a small business.  L2 are traditional switches that handle only Ethernet switching.  You can create VLANs on L2 switches but you cannot route traffic between the VLANs, that would require a router.  An L2+ switch adds some inter-VLAN traffic handling to allow VLANs to exist using static routes.  L3 switches have full IP handling and can do dynamic routing protocols.

So if you need to study “large” scale routing, an L3 switch is good.  This is not a common need and would be the most expensive route and would imply that you intend to purchase a lot more networking gear than just one switch.  In a home lab, this might exist, for handling the home itself, it would not.  If you want to implement VLANs in your home, perhaps one LAN, one Voice LAN, a DMZ and one Guest LAN then an L2+ switch is ideal.  If you don’t plan to study VLANing, stick to L2.

Cabling: One aspect of home networking that is far too often overlooking is implementing a quality cabling plant inside the home.  This requires far more effort than other home networking projects and falls more into the electrician space rather than the IT professional space but is also one of the most important pieces from the home owner perspective and end user perspective rather than the IT pro perspective.  A good, well installed cabling plant will make a home more attractive to buyers and make the value a powerful home network even better.

If you live in an apartment, likely you do not have the option to alter the wiring in this way, unfortunately.  But for home owners, cabling the house can be a great project with a lot of long term value.  Setting up a well labeled and organized cabling plant, just like you would in a business, can be attractive, impressive and eminently useful.  With good, well labeled cabling you can provide high speed, low latency connections without the need for wireless to every corner of your home.  I have found that cabling bedrooms, entertainment spaces and even the kitchen are very valuable.  This allows for higher throughput communications to all devices as wireless congestion is relieved and wired throughput is preferred when possible.  Devices such as video game consoles, smart televisions, receivers, media appliances (a la AppleTV, Roku, Google), desktops, docking stations, stationary laptops, VoIP phones and more all can benefit from the addition of complete cabling.

Wireless Access Point: These days home networks are primarily wireless with many homes being exclusively wireless.  Even if you follow my advice and have great wired networking you still need wireless whether for smart phones, tablets, laptops, guest access or whatever.    A typical home network will already have some cheap, probably unreliable wireless from the onset.  But I propose at least a moderate upgrade to this as a good practice in home networking.

Enterprise Access Points have come down in price dramatically today and a few vendors have even gotten then below one hundred dollars for high quality, centrally managed devices.  Good devices have high quality radios and antennae that will improve range and reliability.  Generally they will come with extra features like mapping, monitoring, centralized management console, VLAN support, hotspot login options, multiple SSID support, etc.  Most of these features are not needed in a home network but are commonly used even in a small business and having them at home for such a low price point makes sense.  If you own a large home, using good Access Points with centralized management can be additionally beneficial in providing whole home coverage.

Having secure guest access via the access point can be very nice in a home allowing guests to be isolated from the data and activities on the home network.  No need to share private passwords and provide access to data that is not necessary while still allowing guests to connect their mobile phones and tablets.  An ever more important feature.

If your home includes outdoor space, adding wireless projects to provide outdoor coverage could also make fora  great learning project.  Outdoor access points and specialized antenna can make for a fun and very useful project.  Make yourself able to stay connected even while roaming outdoors.

Good, enterprise access points are often quite attractive as well, being designed to be wall or ceiling mounted, making it easier to put them in good placement locations to better cover your available space.

Power over Ethernet: Now that you are looking at deploying enterprise access points and if you followed by earlier article on doing a PBX at home and you have desktop or wall mount VoIP phones you may want to consider adding additional PoE switching to reduce the need for electrical cables or power injectors.  A small PoE switch is not expensive and, while never really necessary, can make your home network that much more interesting and “polished.”  Many security devices take advantage of PoE as well as do some project board computers that are increasingly popular today.  The value to adding PoE is ever increasing.

Network Software: Once your home is upgraded to this level, it is only natural to then bring in network management and monitoring software to leverage it even further.  This could be as simple as setting up Wireshark to look at your LAN traffic or it could mean SNMP Monitors, Netflow tools and the like. What is available to you is highly dependent on the vendors and products that you choose but the options are there and this is really where much of the benefit comes in regards to the ongoing educational aspects of network.  Building the network and performing the occasional maintenance will, of course, be very good experience but having the tools to now watch the living network at work and learn from it will be key to the continuing value beyond the impressive end user experience that your household will enjoy.

The Cult of ZFS

It’s pretty common for IT circles to develop a certain cult-like or “fanboy” mentality.  What causes this reaction to technologies and products I am not quite sure, but that it happens is undeniable.  One area that I never thought that I would see this occur is in the area of filesystems – one of the most “under the hood” system components and one that, until recently, received literally no attention even in decently technical circles.  Let’s face it, misunderstanding when something comes from Active Directory versus from NTFS is nearly ubiquitous.  Filesystems are, quite simply, ignored.  Ever since Windows NT 4 released and NTFS was the only viable option the idea that a filesystem is not an intrinsic component of an operating system and that there might be other options for file storage has all but faded away.  That is, until recently.

The one community where, to some small degree, this did not happen was the Linux community, but even there Ext2 and its descendants so completely won mindshare that even thought they were widely available, alternative filesystems were sidelines and only XFS received any attention, historically, and even it received very little.

Where some truly strange behavior has occurred, more recently, is around Oracle’s ZFS filesystem, originally developed for the Solaris operating system and the X4500 “Thumper” open storage platform (originally under the auspices of Sun prior to the Oracle acquisition.)  At the time (nine years ago) when ZFS released, competing filesystems were mostly ill prepared to handle large disk arrays that were expected to be made over the coming years.  ZFS was designed to handle them and heralded in the age of large scale filesystems.  Like most filesystems at that time, ZFS was limited only to a single operating system and so, while widely regarded as a great leap forward in filesystem design, it produces few ripples in the storage world and even fewer in the “systems” world where even Solaris administrators generally considered it a point of interest only for quite some time, mostly choosing to stick to the tried and true UFS that they had been using for many years.

ZFS was, truly, a groundbreaking filesystem and I was, and remain, a great proponent of it.  But it is very important to understand why ZFS did what it did, what its goals are, why those goals were important and how it applies to us today.  The complexity of ZFS has lead to much confusion and misunderstanding about how the filesystem works and when it is appropriate to use.

The principle goals of ZFS were to make a filesystem capable of scaling well to very large disk arrays.  At the time of its introduction, the scale to which ZFS was capable was unheard of in other file systems but there was no real world need for a filesystem to be able to grow that large.  By the time that the need arose, many other file systems such as NTFS, XFS, Ext3 and others had scaled to accommodate the need.  ZFS certainly lead the charge to larger filesystem handling but was joined by many others soon thereafter.

Because ZFS originated in the Solaris world where, like all big iron UNIX systems, there is no hardware RAID, software RAID had to be used.  Solaris had always had software RAID available as its own subsystem.  The decision was made to build a new software RAID implementation directly into ZFS.  This would allow for simplified management via a single tool set for both the RAID layer and the filesystem.  It did not introduce any significant change or advantage to ZFS, as is often believed, it simply shifted the interface for the software RAID layer from being its own command set to being part of the ZFS command set.

ZFS’ implementation of RAID introduced variable width stripes in parity RAID levels.  This innovation closed a minor parity RAID risk known as the “write hole”.  This innovation was very nice but came very late as the era of reliable parity RAID was beginning to end and the write hole problem was already considered to be an unmentioned “background noise” risk of parity arrays as it was not generally considered a thread due to its elimination through the use of battery backed array caches and, at about the same time, non-volatile array caches – avoid power loss and you avoid the write hole.  ZFS needed to address this issue because, as software RAID, it was at greater risk to the write hole than hardware RAID is because there is no opportunity for a cache protected against power loss – hardware RAID offers the potential for an additional layer of power protection for arrays.

The real “innovation” that ZFS inadvertently made was that instead of just implementing the usual RAID levels of 1, 5, 6 and 10 they instead “branded” these levels with their own naming conventions.  RAID 5 is known as RAIDZ.  RAID 6 is known as RAIDZ2.  RAID 1 is just known as mirroring.  And so on.  This was widely considered silly at the time and pointlessly confusing but, as it turned out, that confusion because the cornerstone of ZFS’ revival many years later.

It needs to be noted that ZFS later added the industry’s first production implementation of a RAID 7 (aka RAID 7.3) triple parity RAID system and branded it RAIDZ3.  This later addition is an important innovation for large scale arrays that need the utmost in capacity while remaining extremely safe but are willing to sacrifice performance in order to do so.  This remains a unique feature of ZFS but one that is rarely used.

In the spirit of collapsing the storage stack and using a single command set to manage all aspects of storage the logical volume management functions were rolled into ZFS as well.  It is often mistakenly believed that ZFS introduced logical volume management in certain circles but nearly all enterprise platforms, including AIX, Linux, Windows and even Solaris itself, had already had logical volume management for many years.  ZFS was not doing this to introduce a new paradigm but simply to consolidate management and wrap all three key storage layers (RAID, logical volume management and filesystem) into a single entity that would be easier to manage and could provide inherent communications up and down the stack.  There are pros and cons to this method and an industry opinion remains unformed nearly a decade later.

One of the most important aspects of this consolidation of three systems into one is that now we have a very confusing product to discuss.  ZFS is a filesystem, yes, but it is not only a filesystem.  It is a logical volume manager, but not only a logical volume manager.  People refer to ZFS as a filesystem, which is its primary function, but that it is so much more than a filesystem can be very confusing and makes comparisons against other storage systems difficult.  At the time I believe that this confusion was not foreseen.

What has resulted from this confusing merger is that ZFS is often compared to other filesystems, such as XFS or Ext4.  But this is confusing as ZFS is a complete stack and XFS is only one aspect of a stack.  ZFS would be better compared to MD (Linux Software RAID) / LVM / XFS or to SmartArray (HP Hardware RAID) / LVM/ XFS than to XFS alone.  Otherwise it appears that ZFS is full of features that XFS lacks but, in reality, it is only a semantic victory.  Most of the features often touted by ZFS advocates did not originate with ZFS and were commonly available with the alternative filesystems long before ZFS existed. But it is hard to compare “does your filesystem do that” because the answer is “no…. my RAID or my logical volume manager do that.”  And truly, it is not ZFS the filesystem providing RAIDZ, it is ZFS the software RAID subsystem that is doing so.

In order to gracefully handle very large filesystems, data integrity features were built into ZFS which included a checksum or hash check throughout the filesystem that could leverage the inclusive software RAID to repair corrupted files.  This was seen as necessary due to the anticipated size of ZFS filesystems in the future.  Filesystem corruption is a rarely seen phenomenon but as filesystems grow in size the risk increases.  This lesser known feature of ZFS is possibly its greatest.

ZFS also changed how filesystem checks are handled.  Because of the assumption that ZFS will be used on very large filesystems there was a genuine fear that a filesystem check on boot time could take impossibly long to complete and so an alternative strategy was found.  Instead of waiting to do a check at reboot the system would require a scrubbing process to run and perform a similar check while the system was running.  This requires more system overhead while the system is live but the system is able to recover from an unexpected restart more rapidly.  A trade off but one that is widely seen as very positive.

ZFS has powerful snapshotting capabilities in its logical volume layer and in its RAID layer has implemented very robust caching mechanisms making ZFS an excellent choice for many use cases.  These features are not unique in ZFS but are widely available in systems older than ZFS.  They are, however, very good implementations of each and very well integrated due to ZFS’ nature.

At one time, ZFS was open source and during that era its code became a part of Apple’s Mac OSX and FreeBSD operating systesms because they were compatible with the ZFS license.  Linux did not get ZFS at that time due to challenges around licensing.  Had ZFS licensing allowed Linux to use it unencumbered the Linux landscape would likely be very different today.  Mac OSX eventually dropped ZFS as it was not seen as having enough advantages to justify it in that environment.  FreeBSD clung to ZFS and, over time, it became the most popular filesystem on the platform although UFS is still heavily used as well.  Oracle closed the source of ZFS after the Sun acquisition leaving FreeBSD without continuing updates to its version of ZFS while Oracle continued to develop ZFS internally for Solaris.

Today Solaris remains using the original ZFS implementation now with several updates since its split with the open source community.  FreeBSD  and others continued using ZFS in the state it was when the code was closed source, no longer having access to Oracle’s latest updates.  Eventually work to update the abandoned open source ZFS codebase was taken up and is now known as OpenZFS.  OpenZFS is still fledgling and has not yet really made its mark but has some potential to revitalize the ZFS platform in the open source space but at this time, OpenZFS still lags ZFS.

Open source development for the last several years in this space has focused more on ZFS’ new rival BtrFS which is being developed natively on Linux and is well supported from many major operating system vendors.  BtrFS is very nascent but is making major strides to reach feature parity with ZFS in implemented features but has large aspirations and due to ZFS’ closed source nature has the benefit of market moment.  BtrFS was started, like ZFS, by Oracle and has been widely seen as Oracle’s view of the future being a replacement for ZFS even at Oracle.  At this time BtrFS has already, like ZFS, merged the filesystem, logical volume management and software RAID layers, implemented checksumming for filesystem integrity, scales even larger than ZFS (same absolute limit but handles more files), copy on write snapshots, etc.

ZFS, without a doubt, was an amazing filesystem in its heyday and remains a leader today.  I was a proponent of it in 2005 and I still believe heavily in it.  But it has saddened me to see the community around ZFS take on a fervor and zealotry that  does it no service and makes the mention of ZFS almost seem as a negative – ZFS being so universally chosen for the wrong reasons: primarily a belief that its features exist nowhere else, that its RAID is not subject to the risks and limitations that those RAID levels are always subject to or that it was designed for a different purpose (primarily performance) other than what it was designed for.  And when ZFS is a good choice, it is often implemented poorly based on untrue assumptions.

ZFS, of course, is not to blame.  Nor, as far as I can tell, are its corporate supporters or its open source developers.  Where ZFS seems to have gone awry is in a loose, unofficial community that has only recently come to know ZFS, often believing it to be new or “next generation” because they have only recently discovered it.  From what I have seen this is almost never via Solaris or FreeBSD channels but almost exclusively smaller businesses looking to use a packaged “NAS OS” like FreeNAS or NAS4Free who are not familiar with UNIX OSes.  The use of packaged NAS OSes, primarily by IT shops that possess neither deep UNIX nor storage skills and, consequently, little exposure to the broader world of filesystems outside of Windows and often little to no exposure to logical volume management and RAID, especially software RAID at all, appears to lead to a “myth” culture around ZFS with it taking on an almost unquestionable, infallible status.

This cult-like following and general misunderstanding of ZFS leads often to misapplications of ZFS or a chain of decision making based off of bad assumptions that can lead one very much astray.

One of the most amazing changes in this space is the change in following from hardware RAID to software RAID.  Traditionally, software RAID was a pariah in Windows administration circles without good cause – Windows administrators and small businesses, often unfamiliar with larger UNIX servers believed that hardware RAID was ubiquitous when, in fact, larger scale systems always used software RAID.  Hardware RAID was, almost industry wide, considered a necessity and software RAID completely eschewed.  That same audience, now faced with the “Cult of ZFS” movement, now react in exactly the opposite way believing that hardware RAID is bad and that ZFS’ software RAID is the only viable option.  The shift is dramatic and neither approach is valid – both hardware and software RAID and both in many implementations are very valid options and even using ZFS the use of hardware RAID might easily be appropriate.

ZFS is often chosen because it is believed that it is the highest performance option in filesystems but this was never a key design goal of ZFS.  The features allowing it to scale so large and handle so many different aspects of storage actually make being highly performant very difficult.  ZFS, at the time of its creation, was not even expected to be as fast as the venerable UFS which ran on the same systems as it.  However, this is often secondary to the fact that filesystem performance is widely moot as all modern filesystems are extremely fast and filesystem speed is rarely an important factor – especially outside of massive, high end storage systems on a very large scale.

An interesting study of ten filesystems on Linux produced by Phoronix in 2013 showed massive differences in filesystems by workload but no clear winners as far as overall performance.  What the study showed conclusively is that matching workload to filesystem is the most important choice, that ZFS falls to the slower side of all mainstream filesystems even in its more modern implementations and that choosing a filesystem for performance reasons without a very deep understanding of the workload will result in unpredictable performance – no filesystem should be chosen blindly if performance is an important factor.  Sadly, because the test was done on Linux, it lacked UFS which is often ZFS’ key competitor especially on Solaris and FreeBSD and it lacked HFS+ from Mac OSX. 

Moving from hardware RAID to software RAID carries additional, often unforeseen risks to shops not experienced in UNIX as well.  While ZFS allows for hot swapping, it is often forgotten that hot swap is primarily a feature of hardware, not of software, and it is also widely unknown that blind swapping (removal of hard drives without first offlining them in the operating system) is not synonymous with hot swapping and this can lead to disasters for shops moving from a tradition of hardware RAID that handled compatibility, hot swap and blind swapping transparently for them to a software RAID system that requires much more planning, coordination and understanding of the system in order to use safely.

A lesser, but still common misconception of ZFS, is that it is a clustered filesystem suitable for use on shared DAS or SAN scenarios a la OCFS, VxFS and GFS2.  ZFS is not a clustered filesystem and shares the same limitations in that space as all of its common competitors.

ZFS can be an excellent choice but it is far from the only one.  ZFS comes with large caveats, not the least of which is the operating system limitations associated with it, and while it has many benefits few, if any, are unique to ZFS and it is very rare that any shop will benefit from every one of them.  As with any technology, there are trade offs to be made.  One size does not fit all.  The key to knowing when ZFS is right for you is to understand what ZFS is, what is and is not unique about it, what its design goals are, how comparing a storage storage to a pure filesystem produces misleading results and what inherent limitations are tied to it.

ZFS is a key consideration and the common choice when Solaris or FreeBSD is the chosen operating system.  With rare exception, the operating system should never be chosen for ZFS, but instead ZFS should be often chosen, but not always, when the operating system is chosen.  The OS should drive the filesystem choices in all but the rarest of cases.  The choice of operating system is so dramatically more important than the choice of filesystem.

ZFS can be used on Linux but is not considered an enterprise option there but more of a hobby system for experimentation as no enterprise vendor (such as Red Hat, Suse or Canonical) support ZFS on Linux and as Linux has great alternatives already.  Someday ZFS might be promoted to a first class filesystem in Linux but this is not expected as BtrFS has already entered the mainline kernel and been included in production releases by several major vendors.

While ZFS will be seen in the vast majority of Solaris and FreeBSD deployments, this is primarily because it has moved into the position of default filesystem and not because it is clearly the superior choice in those instances or has even been evaluated critically.  ZFS is perfectly well suited to being a general purpose filesystem where it is native and supported.

What is ZFS’ primary use case?

ZFS’ design goal and principal use case is for Solaris and FreeBSD open storage systems providing either shared storage to other servers or as massive data repositories for locally installed applications.  In these cases, ZFS’ focus on scalability and data integrity really shine.  ZFS leans heavily towards large and enterprise scale shops and generally away from applicability in the small and medium business space where Solaris and FreeBSD skills, as well as large scale storage needs, are rare.

Reference: http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=1

 

Understanding the Western Digital SATA Drive Lineup (2014)

I choose to categorize Western Digital’s SATA drive lineup for several reasons. One is that WD is the current market leader in spinning hard drives so this makes the categorization most useful to the greatest number of people, the “color coded” line is, based on anecdotal evidence, far and away the chosen drive family of the small business market where the diagnosis is most important and SATA drives retain the most disparity of features and factors making them far more necessary to understand well. While technically the only difference between a SAS (SCSI) and SATA (ATA) drive or even a Fibre Channel (FC) drive is nothing but the communications protocol used to communicate with them, in practical terms SAS and FC drives are only made in certain, high reliability configurations and do not require the same degree of scrutiny and do not carry the same extreme risks as SATA drives. Understanding SATA drive offerings is the more important for practical, real world storage needs.

WD has made understanding their SATA drive line up especially easy by adding color codes to the majority of their SATA drive offerings – those deemed to be “consumer” drives, and an “E” designation on their enterprise SATA drives and one outlier, the high performance Velociraptor drives which seek to compete with common SAS performance for SATA controllers. Altogether they have seven SATA drive families to consider covering the gamut of drive factors. While this diagnosis will apply to the easy to understand WD lineup, by comparing factors here with the offerings of other drive makers the use cases of their drives can be determined as well.

In considering SATA drives, three really key factors stand out as being the most crucial to consider (outside of price, of course.)

URE Rate: URE, or Unrecoverable Read Error, is an event that happens, with some regularity, to electromechanical disk storage media where a single sector is unable to be retrieved. In a standalone drive this happens from time to time but generally only affects a single file and users typically see this as a lost file (often one they do not notice) or possible a corrupt filesystem which may or may not easily be corrected. In healthy RAID arrays (other than RAID 0), the RAID system provides mirroring and/or parity that can cover for this sector failure and recreate the data protecting us from URE issues. When a RAID array is in a degraded state UREs are a potential risk again. In its worst case, a URE on a degraded parity array can, in some cases, cause total loss of an array (all data is lost.) So considering UREs and their implications in any drive purchases is extremely important and is the primary driver of cost differential in drives of varying types. URE varies from the low end at 10^14 to the high end at 10^16. The numbers are so large that they are always written in scientific notation. I will not go into an in-depth explanation of URE rates, ramifications and mitigation strategies here, but understanding URE is critical to decision making around drive purchases, especially in the large capacity, lower reliability space of SATA drives.

Spindle Speed: This is one of the biggest factors to most users, spindle speed directly correlates to IOPS and throughput. While measurements of drive speed are dynamic, at best, spindle speed is the best overall way to compare two otherwise identical drives under identical load. A 15,000 RPM drive will deliver almost exactly double the IOPS and throughput of a 7,200 RPM drive, for example. SATA drives commonly come in 5,400 RPM and 7,200 RPM varieties with rare high performance drives available at 10,000 RPMs.

Error Recovery Control (ERC): Also known as TLER (Time Limited Error Recovery) in WD parlance, ERC is a feature of a drive’s firmware which allows for configurable time limits for read or write errors which can be important when a hard drive is used in a RAID array as often error recovery needs to be handled at the array, rather than the drive, level. Without ERC, a drive is more likely to be incorrectly marked as failed when it has not. This is most dangerous in hardware based parity RAID arrays and has differing levels of effectiveness based on individual RAID controller parameters. It is an important feature for drives assumed for use in RAID arrays.

In addition to these key factors, WD lists many others for their drives such as cache size, number of processors, mean time between failures, etc. These tend to be far less important, especially MTBF and other reliability numbers as these can be skewed or misinterpreted easily and rarely offer the insight into drive reliability that we expect or hope. Cache size is not very significant for RAID arrays as they need to be disabled for reasons of data integrity. So outside of desktop use scenarios, the size of a hard drive’s cache is generally considered irrelevant. CPU count could also be misleading as single CPUs may be more powerful than dual CPUs if the CPUs are not identical and the efficacy of the second CPU is unknown. But WD lists this as a prominent feature of some drives and it is assumed that there is measurable performance gain, most likely in latency reduction, through the addition of the second CPU. I do, however, continue to treat this as a trivial factor and mostly only useful as a point of interest rather than as a decision factor
The drives.

All color-coded drives (Blue, Green, Red and Black) share one common factor – they have the “consumer” URE rating of 10^14. Consumer is a poor description here but is, more or less, industry standard. A better description is “desktop class” or suitable for non-parity RAID uses. The only truly poor application of 10^14 URE drives is in parity RAID arrays and even there, they can have their place if properly understood.

Blue: WD Blue drives are the effective baseline model for the SATA lineup. They spin at the “default” 7,200 RPMs, lack ERC/RLER and have a single processor. Drive cache varies between 16MB, 32MB and 64MB depending on the specific model. Blue drives are targeted at traditional desktop usage – as single drives with moderate speed characteristics, not well suited to server or RAID usage. Blue drives are what is “expected” to be found in off the shelf desktops. Blue drives have widely lost popularity and are often not available in larger sizes. Black and Green drives have mostly replaced the use of Blue drives, at least in larger capacity scenarios.

Black: WD Black drives are a small upgrade to the Blue drives changing nothing except to upgrade from one to two processors to slightly improve performance while not being quite as cost effective. Like the Blue drives they lack ERC/TLER and spin at 7,200 RPM. All Black drives have the 64MB cache. As with the Blue drives, Black drives are most suitable for traditional desktop applications where drives are stand alone.

Green: WD Green drives, as their name nominally implies, are designed for low power consumption applications. They are most similar to Blue drives but spin at a slower 5,400 RPMs which requires less power and generates less heat. Green drives, like Blue and Black, are designed for standalone use primarily in desktops that need less drive performance than is expected in an average desktop. Green drives have proven to be very popular due to their low cost of acquisition and operation. It is assumed, as well, that Green drives are more reliable than their faster spinning counterparts due to the lower wear and tear of the slower spindles although I am not aware of any study to this effect.

Red: WD Red drives are unique in the “color coded” WD drive line up in that they offer ERC/TLER and are designed for use in small “home use” server RAID arrays and storage devices (such as NAS and SAN.) Under the hood the WD Red drives are WD Green drives, all specifications are the same including the 5,400 RPM spindle speed, but with TLER enabled in the firmware. Physically they are the same drives. WD officially recommends Red drives only for consumer applications but Red drives, due to their lower power consumption and TLER, have proven to be extremely popular in large RAID arrays, especially when used for archiving. Red drives, having URE 10^14, are dangerous to use in parity RAID arrays but are excellent for mirrored RAID arrays and truly shine at archival and similar storage needs where large capacity and low operational costs are key and storage performance is not very important.

Outside of the color coded drives, WD has three SATA drive families which are all considered “enterprise.” What these drives share in common is that their URE rate is much higher than that of the “consumer” color coded drives. Ranging from URE 10^15 to 10^16 depending on model. The most important result of this URE rate is that these drives are far more applicable to use in parity RAID arrays (e.g. RAID 6.)

SE: SE drives are WD’s “entry level” enterprise SATA drives with URE 10^15 rates and 7,200 RPM spindle speeds. They have dual processors and a 64MB cache. Most importantly, SE drives have ERC/TLER enabled. SE drives are ideal for enterprise RAID arrays both mirrored and parity.

RE: RE drives are WD’s high end standard enterprise SATA drives with all specifications being identical to the SE drives but with the even better URE 10^16 rate. RE drives are the star players in WD’s RAID drive strategy being perfect for extremely large capacity arrays even when used in parity arrays. RE drives are available in both SATA and SAS configurations but with the same drive mechanics.

Velociraptor: WD’s Velociraptor is a bit of an odd member of the SATA category. With URE 10^16 and a 10,000 RPM spindle speed the Velociraptor is both highly reliable and very fast for a SATA drive competing with common, mainline SAS drives. Surprisingly, the Velociraptor has only a single processor and even more surprisingly, it lacks ERC/TLER making it questionable for use in RAID arrays. Lacking ERC, use in RAID can be considered on an implementation by implementation basis depending on how the RAID system interacts with the drive’s timing. With the excellent URE rating, Velociraptor would be an excellent choice for large, higher performance parity RAID arrays but only if the array handles the error timing in a graceful way, otherwise the risk of the array marking the drive as having failed is unacceptably high for an array as costly as this would be. It should be noted that Velociraptor drives do not come in capacities comparable to the other SATA drive offerings – they are much smaller.

Of course the final comparison that one needs to make is in price. When considering drive purchases, especially where large RAID arrays are concerned or for other bulk storage needs, the per drive cost is often a major, if not the driving, factor. The use of slower, less reliable drives in a more reliable RAID level (such as Red drives in RAID 10) versus faster, more reliable drives in a less reliable RAID level (such as RE drives in RAID 6) often provides a better blend of reliability, performance, capacity and cost. Real world drive prices play a significant factor in these decisions. These prices, unlike the drive specifications, can fluctuate from day to day and swing planning decisions in different directions but, overall, tend to remain relatively stable in comparison to one another.

At the time of this article, at the end of 2013, a quick survey of prices of 3TB drives from WD give these approximate breakdown:

  • Green $120
  • Red $135
  • Black $155
  • SE $204
  • RE $265

As can be seen, the jump in price primarily comes between the consumer or desktop class drives and the enterprise drives with their better URE rates with Red and RE drives, both with ERC/TLER, being in a price ratio of almost exactly 2:1 making, for equal capacity, it favorable to choose many more Red drives in RAID 10 than fewer RE drives in RAID 6, as an example. So comparing a number of factors, along with current real world prices, is crucial to making many buying decisions.

Newer drives, just being released, are starting to see reductions in onboard drive cache for exactly the reasons we stated above, drives designed around RAID use have little or no purpose to having onboard cache as it needs to be disabled for data integrity purposes.
Drive makers today are offering a wide variety of traditional spindle-based drive options to fit many different needs. Understanding these can lead to better reliability and more cost effective purchasing and will extend the usefulness of traditional drive technologies into the coming years.