One Big Flat Network

There is a natural movement of networks to become unnecessarily complicated.  But there is great value in keeping networks clean and simple.  Simple networks are easier to manage, more performant and more reliable while being generally less expensive.  Every network needs a different level of complexity and large networks will certainly need an extensive level of it, but small businesses can often keep networks extremely simple which is part of what makes smaller businesses more agile and less expensive, giving them an edge over their larger counterparts.  This is an edge that they must leverage because they lack the enterprise advantage of scale.

There are two ways to look at network complexity.  The first is the physical network – the actual setup of the switches and routers that make up the network.  The second is the logical network – how IP address ranges are segmented, where routing barriers exist, etc.  Both are important to consider when looking at the complexity of your network.

It should be the goal of any network to be as simple as possible while still meeting all of the goals and requirements of the network.  

The first aspect we will address is the physically flat network.   Reducing a physical network to be flat can have a truly astounding effect on the performance and reliability of  that network.  In a very small network this could mean working from a single switch for all connections.  Typically this is only available for the very smallest networks as switches rarely are available above forty-eight or possibly fifty-two ports.  But for many small businesses this is completely possible.  It may require additional cabling for a building, in order to bring all connections back to a central location, but can often be attained – at least on a site by site basis.  Many businesses today have multiple locations or staff working from home and this can make the network challenges much greater, although each location can strive for its own simplicity in those cases.

As a network grows the concept of the single switch can be grown as well using the concept of switch stacking.  Stacked switches share a single switching fabric or backplane.  When stacked they behave as a single switch but with more ports.  (Some switches do true backplane sharing and some mimic this with very high speed uplink ports with shared management via that port.)  A switch stack is managed as a single switch making network management no more difficult, complex or time consuming for a stack than for a single switch.  It is common for a switch stack to grow to at least three hundred ports if not more.  This allows for much larger physical site growth before needing to leave the single switch approach.

In some cases, some large module single switch chassis will grow even larger than this allowing for four hundred or more ports in a single switch but in a “blade like” enterprise switching chassis.

By being creative and looking at simple, elegant solutions it is entirely possible to keep even a moderately large network contained to a single switching fabric allowing all network connections to share a single backplane.

The second area that we have to investigate is the logical complexity of the network.  Even in physically simple networks it is common to find small businesses investing a significant amount of time and energy into implementing unnecessary subnets or VLANs and all of the overhead that comes with those.

Subnetting is rarely necessary in a small or even a smaller medium-sized business.  Traditionally, going back to the 1990s, it was very common to want to keep subnets to a maximum of 256 devices (or a /24 subnet) because of packet collision, broadcasts and other practical issues.  This made a lot of sense in that era when hubs were used instead of switches and broadcasts were common and network bandwidth was lucky if it was 10Mb/s on a shared bus.  Today’s broadcast light, collision free, 1Gb/s dedicated channel networks experience network load in a completely different manner.  Where 256 devices on a subnet was an extremely large network then, having more than 1,000 devices on a single subnet is a non-issue today.

These changes in how networks behave mean that small and medium businesses almost never need to subnet for reasons of scale and can comfortably use a single subnet for their entire business reducing complexity and easing network management.  More than a single subnet may be necessary to support specific network segmentation like separating production and guest networks, but scale, the reason traditionally given for subnetting networks, becomes an issue solely of larger businesses.

It is tempting to want to implement VLANs on every small business environment as well.  Subnetting and VLANs are often related and often confused, but subnets often exist without VLANs, while VLANs do not exist without subnets.

In large environments VLANs are a foregone conclusion and it is simply assumed that they will exist.  This mentality often filters down to smaller organizations who are often tempted to apply this to businesses which lack the scale that makes VLAN management make sense.  VLANs should be relatively uncommon in a small business network.

The most common place where I see VLANs used when they are not needed is in Voice over IP or VoIP networks.  It is a common assumption that VoIP has special needs that require VLAN support.  This is not true.  VoIP and the QoS that it sometimes needs are available without VLANs and often will work better without them.

VLANs really only become important when either management is needed at large scale (where scale is larger than a single subnet can provision) and cannot be physically segregated or when specific network-layer security is needed which is relatively rare in the SMB market.  VLANs are very useful and do have their place.  VLANs are often used if a dedicated guest network is needed but generally in a small business guest access is provided via a direct guest connection to the Internet rather than a quarantined network for guests.

The most common practical use of a VLAN in an SMB is likely to be a walled garden DMZ designed for quarantined BYOD remote access where BYOD devices connect much like guests but have the ability to access remote access resources like RDP, ICA or PCoIP protocols.  VLANs would also be popular for building traditional DMZs for externally facing public services such as web and email servers – except that these services are not commonly kept on the local network for hosting in today’s SMBs so this classic use of VLANs in the SMB is rapidly fading.

Another use case where VLANs are often used inappropriately is for a Storage Area Network or SAN.  It is best practice that a SAN be a completely independent (air gapped), physically unique network unrelated to the regular switching infrastructure.  It is generally not advised that a SAN be created using VLANs or subnets but instead be on dedicated switches.

It is tempting to add complex switching setups, additional subnets and VLANs because we hear about these things from larger environments, they are fun and exciting, and they appear to add job security by making the network more difficult to maintain.  Complex networks require higher end skills and can seem like a great way to use that networking certificate.  But in the long run, this is a bad career and IT strategy.  Network complexity should be added in a lab for learning purposes, not in production networks.  Production networks should be run as simply, elegantly and cost effectively as possible.

With relatively little effort, a small business network can likely be designed to be both physically and logically very simple.  The goal, of course, is to come as close as possible to creating single, flat network structure where all devices are physical and logical peers with no unnecessary bottlenecks or protocol escalations.  This improves performance and reliability, reduces costs and frees IT resources to focus on more important tasks.

Originally posted on the StorageCraft Blog.

Starting the IT Clock Ticking

Everyone is always happy to tell you how important experience is over certifications and degrees when working in IT. Few things are so readily agreed upon within the industry. What is shocking, however, is how often that advice does not get translated into a practical reality.

New IT hopefuls, when asking for guidance, will be told the value of experience but then sent everywhere except towards experience with the advice that they receive. This makes no sense. When applying for IT jobs, hiring managers and human resources departments are interested in knowing when you started in IT and how many years you have been in the field. That’s a hard number and one that you can never change once it has been set. Your start date is a factor of your career with which you are stuck for the rest of your life. You can get a degree anytime. You can get certified anytime. But your entry date into the field is permanent, it is the most important thing that an IT professional hopeful needs to be focused on.

Many things will qualify as the first “start date” in a career. What is important is getting into a real IT position, or a software development position, to affix that date as early as possible.  (Nearly everyone in the field accepts the software engineering field as experience directly relevant to IT even though it is technically not IT.)  This counts towards experience which can, in turn, count towards other things including eligibility for positions, pay increases or even vacation accrual or similar benefits. Often IT professional hopefuls do not think about the range of possibilities for establishing that entry date into the field and overlook opportunities or they downplay the value of the entry date and opt out of opportunities that would have greatly benefited them choosing to focus, instead, or more “socially accepted” activities that ultimately play a far smaller role in their overall career.

The most obvious example of an IT entry date is obtaining an entry level position in the field.  Because this is so obvious, many people forget that there are other options and can easily become overly focused on finding their first “normal” job, typically on a helpdesk, and may lose sight of everything else.

Even worse,  it is common for assumptions to be made about how a first job is typically acquired and then, because of the assumed steps to get from A to B, often the focus shifts to those steps and the real goal is missed completely.  For example, it is often assumed that a college degree and industry certifications are requirements for getting into an entry level position.  Certainly it is true that an education and certifications can make breaking into the industry much easier.  But these themselves are not the goal, they are tools to achieve the goal.  Getting work to start a career is the goal, but often those extra steps get in the way of career opportunities and a loss of focus leads would-be IT pros to misstep and skip career opportunities because they have become focused on proximate achievements like certifications rather than looking at their life from a goal level.

I have heard many times IT students ask if they should take a  job offer in their chosen career or continue with a degree path instead.  Even if the job is very good, it seems that almost ubiquitously the choice will be made to turn down the critical professional position because the student has lost focus and is thinking of the proximate goal, their education, and forgetting about the true goal, their career.  This reaction is far more common than anyone would realize and very damaging to students’ prospects.  Perhaps they often feel that since an opportunity came along before they had completed their studies that good entry level positions are common and easy to acquire, perhaps they simply forgot why they were going to school in the first place and perhaps they simply are not concerned with their careers and wish to spend their time relaxing in college before taking that next step.  Many students probably fear being able to complete their education if they take a position in IT before completing but there are very good options for this that would allow for both the critical needs of their career and completing their education in a good way too.  Taking a career position does not need to have a negative impact on the ability to complete an education if the educational process is deemed to still be important.

There are several avenues that allow for starting the “career clock”, as I like to think of it.  The easiest for most people, especially those relatively young, is to find an internship.  Internships can be found even very young, middle school or early high school and generally into the mid or even late twenties.   Internships can be amazingly valuable, both because they often allow the earliest entry into the field (specifically unpaid internships) generally many years earlier than other options with the fewest up front expectations.  Students pursuing internships from a young age can often get a career jump of two to ten years on their non-interning counterparts!  The ability to leap forward in your career can be dramatic.  Internships abound and few students take the time and effort to invest in them.  Those students honestly interested in an internship will likely have no problems securing one.

Internships can be much more valuable than regular jobs because they, by definition, should include some amount of mentorship and projects designed to educate.  An entry level job typically focuses on simple, highly repeatable tasks that teach relatively little while a real internship should focus on growing and developing skills and an understanding of the IT discipline.  Because of this, a good internship will generally build a resume and establish experience much faster than most other methods, often allowing a wider range of exposure to different areas of IT.

Another good path for getting into IT as early as possible is volunteer work.  This is a little like interning except requires more effort and determination on the part of the hopeful IT professional and lacks the expectation of mentoring and oversight.  A volunteer role is always unpaid but because of this often offers a lot of flexibility and opportunity.  There are many places that need or welcome IT volunteers such as churches, private schools and other non-profits running on tight budgets.  With volunteer work you will often get greater decision-making opportunities and likely exposing the needs to think of IT within financial constraints which, while typically tighter at a not for profit, exists in every instance of IT.  This business exposure is even better for resume building.

Volunteering is generally more difficult to do at a young age and a level of maturity and knowledge is often needed but not in all cases.  Volunteering at a larger non-profit which already has paid IT or more senior volunteer IT might combine volunteering and a nearly intern-like situation.  Whereas a smaller non-profit, often like churches or similar, might result in dealing with IT alone which can be very educational but potentially daunting and even overwhelming to a younger or nascent IT professional in the making.  A volunteer in a small non-profit may be in a position to run an IT shop, from top to bottom, before even being employed in their first traditional position.

Of course no single approach need be taken alone.  Interning with a for profit firm and volunteering as well can be even better, making for an even stronger and more valuable IT entry point.  Sometimes intern or volunteer work may continue even after traditional, paying employment is found because one pays the bills while the other builds the resume.

Even less traditional options can exist such as starting a business on your own, which is generally extremely difficult and often not possible at a young age or finding traditional work while very young.  Starting a business will often teach a large volume of business skills and a small amount of IT ones and can be extremely valuable at a potentially devastating cost.  Compared to other approaches this is very risk under normal circumstances.  It certainly can be done but would rarely be considered the best choice.

What matters most is finding a position that establishes a starting point into IT.  Once that stake is driven into the proverbial ground it is set and the focus can shift to skill acquisition, broader experience, education, certifications or whatever is needed to take the career to the next level.  All of those subsequent skills are soft, they can be enhanced as needed.  But that starting date can never be moved and is absolutely crucial.

It is often not well communicated to high school and college age IT hopefuls that these opportunities are readily available and just how important they are.  So often society or the established education machine encourage students and those in the collegiate ages to discount professional opportunities and focus on education to the detriment of their experience and long term careers.  IT and software development are not careers that are well supported by traditional career planning and are especially not well suited to people who wait to jump into them until they feel “ready” because there will always be those with ambition and drive doing so at a far younger age who will have built a career foundation long before most of their peers even consider their futures.  IT is a career path that rewards the bold.

There is no need to follow the straight and narrow traditional path in IT.  That path exists and many will follow it; but it is not the only path and those that stray from it will often find themselves at a great advantage.

No matter what path you choose to take in your pursuit of a career in IT, be sure to be extremely conscious of the need to not just acquire skills but to establish experience and start the clock ticking.

Originally published on the StorageCraft Blog.

IT Generalists and Specialists

IT Professionals generally fall into two broad categories based on their career focus: generalists and specialists. These two categories actually carry far more differences than they may at first appear to do and moving between them can be extremely difficult once a career path has been embarked upon; often the choice to pursue one path or the other is made very early on in a career.

There are many aspects that separate these two types of IT professionals, one of the most poignant and misunderstood is the general marketplace for these two skillsets. It is often assumed, I believe, that both types exist commonly throughout the IT market but this is not true. Each commands its own areas.

In the small and medium business market, the generalist rules. There is little need for specialties as there are not enough technical needs in any one specific area to warrant a full time staff member dedicating themselves to them. Rather, a few generalists are almost always called upon to handle a vast array of technical concerns. This mentality also gives way to “tech support sprawl” where IT generalists are often called upon to venture outside of IT to manage legacy telephones, electrical concerns, HVAC systems and even sprinklers! The jack of all trades view of the IT generalist has a danger of being taken way too far.

It should be mentioned, though, that in the SMB space the concept of a generalist is often one that remains semi-specialized. SMB IT is nearly a specialization on its own. Rather than an SMB generalist touching nearly every technology area it is more common for them to focus across a more limited subset. Typically an SMB generalist will be focused primarily on Windows desktop and server administration along with application support, hardware management and some light security. SMB generalists may touch nearly any technology but the likelihood of doing so is generally rather low.

In the enterprise space, the opposite is true. Enterprise IT is almost always broken down by departments, each department handling very focused IT tasks. Typically these include networking, systems, storage, desktop, helpdesk, application specific support, security, datacenter support, database administration, etc. Each department focuses on a very specific area, possibly with even more specialization within a department. Storage might be broken up by block and file. Systems by Windows, mainframe and UNIX. Networking by switching and firewalls. In the enterprise there is a need for nearly all IT staff to be extremely deep in their knowledge and exposure to the products that

they support while needing little understanding of products that they don’t support as they have access to abundant resources in other departments to guide them where there are cross interactions. This availability of other resources and a departmental separation of duties, highlights the differences in generalists and specialists.

Generalists live in a world of seeing “IT” as their domain to understand and oversee, potentially segmented by “levels” of difficulty rather than technological focus and typically a lack of specialized resources to turn to internally for help. While specialists live in a world of departmental division by technology where there are typically many peers working at different experience levels within a single technology stack.

It is a rare SMB that would have anything but a generalist working there. It is not uncommon to have many generalists, even generalists who lean towards specific roles internally but who remain very general and lacking a deep, singular focus. This fact can make SMB roles appear more specialized that they truly are to IT professionals who have only experienced the SMB space. It is not uncommon for SMB IT professionals to not even be aware of what specialized IT roles are like.

A good example of this is that job titles common and generally well defined in the enterprise space for specialists are often used accidentally or incorrectly with generalists not realizing that the job roles are specific. Specialists titles are often used for generalists positions that are not truly differentiated.

Two exceptionally common examples are the network engineering and IT manager titles.  For a specialist, network engineer means a person whose full time, or nearly full time, job focus is in the design and planning and possibly implementation of networks including the switching, routing, security, firewalling, monitoring, load balancing and the like, of the network itself.  They have no role in the design or management of the systems that use the network, only the network itself.  Nor do they operate or maintain the network, that is for the network administrator to do who, again, only touches switches, routers, firewalls, load balancers and so forth not computers, printers, servers and other systems.  It is a very focused title.  In the SMB it is common to give this title to anyone who operates any device on a network often with effectively zero design or network responsibilities at all.  No role overlaps.

Likewise in the enterprise an IT manager is a management role in an IT department.  What an IT manager manages, like any manager, is people.  In the SMB this title may be used correctly but it is far more common to find the term applies to the same job role to which network engineer is used – someone who has no human reports and manages devices on a network like computers and printers.  Not a manager at all, but a generalist administrator.  Very different than what the title implies or how it is expected to be used in the large business and enterprise space.

Where specialists sometimes enter the SMB realm is through consultants and service providers who provide temporary, focused technical assistance to smaller firms that cannot justify having those skills maintained internally. Typically areas where this is common is storage and virtualization where consultants will often design and implement core infrastructure components and leave the day to day administration of them to the in-house generalists.

In the enterprise the situation is very different. Generalists do exist but, in most cases, the generalization is beaten out of them as their careers take them down the path of one specialization or another. Entry level enterprise workers will often come in without a clear expectation of a specialization but over time find themselves going into one quite naturally. Most, if not all, IT growth paths through enterprise IT require a deep specialization (which may mean focusing on management rather than technical.) Some large shops may provide for cross training or exposure to different disciplines but rarely is this extensively broad and generally does not last once a core specialization is chosen.

This is not to say that enterprises and other very large shops do not have generalists, they do. It is expected that at highest echelons of enterprise IT that the generalists roles will begin to reemerge as new disciplines that are not seen lower in the ranks. These titles are often labeled differently such as architect, coordinator or, of course, CIO.

The reemergence of generalists at the higher levels of enterprise IT poses a significant challenge for an industry that does little to groom generalists. This forces the enterprise generalist to often “self-groom” – preparing themselves for a potential role through their own devices. In some cases, organic growth through the SMB channels can lead to an enterprise generalist but this is extremely challenging due to the lack of specialization depth available in the majority of the SMB sector and a lack of demonstrable experience in the larger business environment.

These odd differences that almost exclusively fall down SMB vs. enterprise lines creates a natural barrier, beyond business category exposure, to IT professionals migrating back and forth between larger and smaller businesses. The type of business and work experience is vastly different and the technology differences are dramatically different. Both enterprise IT pros are often lost moving to an SMB and SMB pros find that what they felt was deep, focused experience in the SMB is very shallow in the enterprise. The two worlds operate differently at every level, but outside of IT the ability to move between them is far easier.

Enterprise IT carries the common titles that most people associate with IT career specialization: system administration, network engineer, database administrator, application support, helpdesk, desktop support, datacenter technician, automation engineer, network operations center associate, project manager, etc. SMB titles are often confusing both inside of and outside of the industry. It is very common for SMB roles to coopt specialization titles and apply them to roles that barely resemble their enterprise counterparts in any way and don’t match the expectation of a title at all, as I demonstrated earlier. This further complicates the fluid movement between realms as both sides become increasingly confused trying to understand how people and roles related to each other coming from the other realm. There are titles associated with generalists, such as the rather dated LAN Administration, IT Generalist and architect titles but their use, in the real world, is very rare.  The SMB struggles to define meaningful titles and has no means by which to apply or enforce these across the sector.  This lack of clear definition will continue to plague both the SMB and generalists who have little ability to easily convey the nature of their job role or career path.

Both career paths offer rewarding and broad options but the choice between them does play a rather significant role in deciding the flavor of a career.  Generalists, beyond gravitating towards smaller businesses, will also likely picking up a specialization in an industry over time as they move into higher salary ranges (manufacturing, medical, professional services support, legal, etc.)  Specialists will find their focus is in their technology and their focus on market will be less.  Generalist will find it easier to find work in any given local market, specialists will find that they often need to move to major markets and potentially only the core markets will provide great growth opportunities but within those markets mobility and career flexibility will be very good.  Generalists have to work hard to keep up with a broad array of technologies and changes in the market.  Specialists will often have deep vendor resources available to them and will find the bulk of their educational options come directly from the vendors in their focus area.

It is often personality that pushes young IT professionals into one area or the other.  Specialists are often those that love a particular aspect of IT and not others or want to avoid certain types of IT work as well as those that look at IT more as a predetermined career plan.  Generalists often come from the ranks of those that love IT as a whole and fear being stuck in just one area where there are so many aspects to explore.  Generalists are also far more likely to have “fallen into” IT rather than having entered the field having a strategic plan.

Understanding how each approaches the market and how the markets approach IT professionals help the IT professional have an opportunity to assess what it is that they like about their field and make good career choices to keep themselves happy and motivated and allows them to plan in order to maximize the impact of their career planning decisions.  Too often, for example, small business generalists will attempt to do a specialization focus, very often in enterprise Cisco networking just as a common example, which have almost no potential value to the marketplace where their skills and experience are focused.  Professionals doing this will often find their educational efforts wasted and be frustrated that the skills that they have learned go unused and atrophy while also being frustrated that gaining highly sought skills do not appear to contribute to new job opportunities or salary increases.

There is, of course, opportunity to move between general and special IT roles.  But the more experience a professional gains in one area or the other, the more difficult it becomes to make a transition, at least without suffering from a dramatic salary loss in order to do so.  Early in an IT career, there is relatively high flexibility to move between these areas at the point where the broadening of generalization is minimal or the deep technical skills of specialization are not yet obtained.  Entry level positions in both areas are effectively identical and there is little differentiation in career starting points.

Greater perspective on IT careers gives everyone in the field more ability and opportunity to pursue and achieve the IT career that will best satisfy their technical and personal work needs.

What is RAID 100?

RAID 10 is one of the most important and commonly used RAID levels in use today. RAID 10 is, of course, what is known as compound or nested RAID where one RAID level is nested within another. In the case of RAID 10, the “lowest” level of RAID, the one touching the physical drives, is RAID 1. The nomenclature of nested RAID is that the number to the left is the one touching the physical drives and each number to the right is the RAID that touches those arrays.

So RAID 10 is a number of RAID 1 (mirror) sets that are in a RAID 0 (non-parity stripe) set together. There is a certain common terminology sometimes applied, principally championed by HP, to refer to even RAID 1 as simply being a subset of RAID 10 – a RAID 10 array where the RAID 0 length is one. A quirky way to think of RAID 1, to be sure, but it actually makes many discussions and comparative calculations easier and makes sense in a practical way for most storage practitioners. Thinking of RAID 1 as a “special name” for the smallest possible RAID 10 stripe size and allowing, then, all RAID 10 permutations to exist as a calculation continuum makes sense.

Likewise, HP also refers to solitary drives attached to a RAID controller as RAID 0 sets of a stripe of one as well. So the application of that terminology to the RAID 10 world is actually more obvious and sensible when it is looked at in that light. However neither HP nor any other vendor today applies this same naming oddity to other array types such as RAID 5 being a subset of RAID 50 or RAID 6 being a subset of RAID 60 even though they can be thought of that way exactly the same as RAID 1 can be to RAID 10.

If we take that same logic and take it to the next level, figuratively and literally, we can take multiple RAID 10 arrays and stripe them together in another RAID 0. This seems odd but can make sense. The result is a stripe of RAID 10s or, to write it out, a stripe of stripes of mirrors (we generally state RAID from the top down but the nomenclature is from the bottom up.) So as this is RAID 1 on the physical drives, a stripe of those mirrors and then a stripe of those resultant arrays we get RAID 100 (R100.)

RAID 100 is, of course, rare and odd. However one extremely important RAID controller manufacturer utilizes R100 and, subsequently, so does their downstream integration vendor: namely LSI and Dell.
Fortunately because non-parity stripes inject little behavioral oddities and have near zero overhead or latency this approach is really not a problem although it can lead to a great deal of confusion. For all intents and purposes, RAID 100 behaves exactly like RAID 10 when each RAID 10 subset is identical to each other.

In theory, a RAID 100 could be made up of many disparate RAID 10 sets of varying drive types, spindle counts and speeds. In theory a RAID 10 could be made of up disparate RAID 1 sets but this is far more limited in potential or likely variation. RAID 100 could, theoretically, do some pretty bizarre things if left unchecked. In practicality, though, any RAID 100 implementation will likely, as does LSI’s implementation, enforce standardization and require that each RAID 10 subset be as identical as a controller is capable of enforcing. So each will be effectively uniform keeping the overall behavior to be the same as if the same drives were set up as RAID 10.

Because the behavior remains identical to RAID 10 there is an extremely strong tendency to avoid the confusion of calling the array RAID 100 and simply referring to it as RAID 10. This would work fine except for the semi-necessary quirk of needing to be able to specify the geometry of the underlying RAID 10 sets when building a RAID 100. LSI, and therefore Dell, requires that at the time of setting up a RAID 100 set that you must specify the underlying RAID 10 geometry but since the array is labeled as RAID 10, this makes no sense. A bizarre situation indeed.
To further complicate matters, because of the desire to maintain a façade of using RAID 10 rather than RAID 100, proper terminology is eschewed and instead of referring to the underlying RAID 10 members as “RAID 10 arrays” or “RAID 10 subsets” they are simply called “spans.” Span, however, being a term used for something else in storage that doesn’t apply properly here. Span, in no way, is a proper description for a RAID 10 set under any condition.

But if we agree to use the term span to refer to a RAID 10 subset of a RAID 100 array we can move forward pretty easily. Whenever possible, then, we want as many spans as possible to keep the underlying RAID 10 subsets as small as possible. If we make them small enough they actually collapse into RAID 1 sets (HP’s odd RAID 10 with a stripe size of one) and our RAID 100 collapses into a RAID 10 with the middle stripe, rather than the outside stripe, being the one that disappears! Bizarre, yes, but practical.

So how do we apply this in real life? Quite easily. In a RAID 100 array we must specify a count of spans to be used. Since we desire that each span contain two physical drive devices so that each span is a simple RAID 1 we simply need to take the total number of drives in our RAID 100 array, which we will call N, and divide that by two. So the desired span count for a normal RAID 100 array is simply N/2. This means if you have a two drive array, you want one span. Four drives, two spans. Six drives, three spans. Twenty four drives, twelve spans. And so on.

Do not be afraid of RAID 100. For normal users it simply requires some additional knowledge of how to select the proper number of spans. It would be ideal if this was calculated automatically and kept hidden allowing end users to think of the arrays in terms of RAID 10. Or else be labeled consistently as RAID 100 to make it clear what the span must represent. Or, of course, simply use RAID 10 instead of RAID 100. But given the practical state of reality, dealing with RAID 100, once it is understood, is easy.

Comparing RAID 10 and RAID 01

These two RAID levels often bring about a tremendous amount of confusion, partially because they are incorrectly used interchangeably and often simply because they are poorly understood.

First, it should be pointed out that either maybe be written with or without the plus sign: RAID 10 is RAID 1+0 and RAID 01 is RAID 0+1. Strangely, RAID 10 is almost never written with the plus and RAID 01 is almost never written without. Storage engineers generally agree that the plus is never used as it is superfluous.

Both of these RAID levels are “compound” levels made from two different, simple RAID types being combined. Both are mirror-based, non-parity compound or nested RAID. Both have essentially identical performance characteristics – nominal overhead and latency with NX read speed and (NX)/2 write speed where N is the number of drives in the array and X is the performance of an individual drive in the array.

What sets the two RAID levels apart is how they handle disk failure. The quick overview is that RAID 10 is extremely safe under nearly all reasonable scenarios. RAID 01, however, rapidly becomes quite risky as the size of the array increases.

In a RAID 10, the loss of any single drive results in the degradation of a single RAID 1 set inside of the RAID 0 stripe. The stripe level sees no degradation, only the one singular RAID 1 mirror does. All other mirrors are unaffected. This means that our only increased risk is that the one single drive is now running without redundancy and has no protection. All other mirrored sets still retain full protection. So our exposure is a single, unprotected drive – much like you would expect in a desktop machine.

Array repair in a degraded RAID 10 is the fastest possible repair scenario. Upon replacing a failed drive, all that happens is that that single mirror is rebuilt – which is a simple copy operation that happens at the RAID 1 level, beneath the RAID 0 stripe. This means that if the overall array is idle the mirroring process can proceed at full speed and the overall array has no idea that this is even happening. A disk to disk mirror is extremely fast, efficient and reliable. This is an ideal recovery scenario. Even if multiple mirrors have degradation simultaneously and are repairing simultaneously there is no additional impact as the rebuilding of one does not impact others. RAID 10 risk and repair impact both scale extremely well.

RAID 01, on the other hand, when it loses a single drive immediately loses an entire RAID 0 stripe. In a typical RAID 01 mirror there are two RAID 0 stripes. This means that half of the entire array has failed. If we are talking about an eight drive RAID 01 array, the failure of a single drive renders four drives instantly inoperable and effectively failed (hardware does not need to be replaced but the data on the drives is out of date and must be rebuilt to be useful.) So from a risk perspective, we can look at it as being a failure of the entire stripe.

What is left after a single disk has failed is nothing but a single, unprotected RAID 0 stripe. This is far more dangerous than the equivalent RAID 10 failure because instead of there being only a single, isolated hard drive at risk there is now a minimum of two disks and potentially many more at risk and each drive exposed to this risk magnifies the risk considerably.

As an example, in the smallest possible RAID 10 or 01 array we have four drives. In RAID 10 if one drive fails, our risk is that its matching partner also fails before we rebuild the array. We are only worried about that one drive, all other drives in the RAID 10 set are still protected and safe. Only this one is of concern. In a RAID 01, when the first drive fails its partner in its RAID 0 set is instantly useless and effectively failed as it is no longer operable in the array. What remains are two drives with no protection running nothing but RAID 0 and so we have the same risk that RAID 10 did, twice. Each drive has the same risk that the one drive did before. This makes our risk, in the best case scenario, much higher.

But for a more dramatic example let us look at a large twenty-four drive RAID 10 and RAID 01 array. Again with RAID 10, if one drive fails all others, except for its one partner, are still protected. The extra size of the array added almost zero additional risk. We still only fear for the failure of that one solitary drive. Contrast that to RAID 01 which would have had one of its RAID 0 arrays fail taking twelve disks out at once with the failure of one leaving the other twelve disks in a RAID 0 without any form of protection. The chances of one of twelve drives failing is significantly higher than the chances of a single drive failing, obviously.

This is not the entire picture. The recovery of the single RAID 10 disk is fast, it is a straight copy operation from one drive to the other. It uses minimal resources and takes only as long as is required for a single drive to read and to write itself in its entirety. RAID 01 is not as lucky. Unlike RAID 10 which rebuilds only a small subset of the entire array, and a subset that does not grow as the array grows – the time to recover a four drive RAID 10 or a forty drive RAID 10 after failure is identical, RAID 01 must rebuild an entire half of the whole parents array. In the case of the four drive array, this is double the rebuild work of the RAID 10 but in the case of the twenty four drive array it is twelve times the rebuild work to be done. So RAID 01 rebuilds take longer to perform while being under significantly more risk during that time.

There is a rather persistent myth that RAID 01 and RAID 10 have different performance characteristics, but they do not. Both use plain striping and mirroring which are effectively zero overhead operations that requires almost no processing overhead. Both get full read performance from every disk device attached to them and each lose half of their write performance to their mirroring operation (assuming two way mirrors which is the only common use of either array type.) There is simply nothing to make RAID 01 or RAID 10 any faster or slower than the other. Both are extremely fast.

Because of the characteristics of the two array types, it is clear that RAID 10 is the only type, of the two, that should ever exist within a single array controller. RAID 01 is unnecessarily dangerous and carries no advantages. They use the same capacity overhead, they have the same performance, they cost the same to implement, but RAID 10 is significantly more reliable.

So why does RAID 01 even exist? Partially it exists out of ignorance or confusion. Many people, implementing their own compound RAID arrays, choose RAID 01 because they have heard the myth that it is faster and, as is generally the case with RAID, do not investigate why it would be faster and forget to look into its reliability and other factors. RAID 01 is truly only implemented on local arrays by mistake.

However, when we take RAID to the network layer, there are new factors to consider and RAID 01 can become important, as can its rare cousin RAID 61. We denote, via Network RAID Notation, where the local and where the network layers of the RAID exist. So in this case we mean RAID 0(1) OR RAID 6(1). The parentheses denote that the RAID 1 mirror, the “highest” portion of the RAID stack, is over a network connection and not on the local RAID controller.

How would this look in RAID 0(1)? If you have two servers, each with a standard RAID 0 array and you want them to be synchronized together to act as a single, reliable array you could use a technology such as DRBD (on Linux) or HAST (on FreeBSD) to create a network RAID 1 array out of the local storage on each server. Obviously this has a lot of performance overhead as the RAID 1 array must be kept in sync over the high latency, low bandwidth LAN connection. RAID 0(1) is the notation for this setup. If each local RAID 0 array was replaced with a more reliable RAID 6 we would write the whole setup as RAID 6(1).
Why do we accept the risk of RAID 01 when it is over a network and not when it is local? This is because of the nature of the network link. In the case of RAID 10, we rely on the low level RAID 1 portion of the RAID stack for protection and the RAID 0 sits on top. If we replicate this on a network level such as RAID 1(0) what we end up with is each host having a single mirror representing only a portion of the data of the array. If anything were to happen to any node in the array or if the network connection was to fail the array would be instantly destroyed and each node would be left with useless, incomplete data. It is the nature of the high risk of node failure and risk at the network connection level that makes RAID decisions in a network setting extremely different. This becomes a complex subject on its own.

Suffice it to say, when working with normal RAID array controllers or with local storage and software RAID, utilize RAID 10 exclusively and never RAID 01.

It Worked For Me

“Well, it worked for me.”  This has become a phrase that I have heard over and over again in defense of what would logically be otherwise considered a bad idea.  These words are often spoken innocently enough without deep intent, but they often cover deep meaning that should be explored.

But it is important to understand what drives these words both psychologically as well as technically.  At a high level, what we have is the delivery of an anecdote which can be restated as such: “While the approach or selection that I have used goes against your recommendation or best practices or what have you, in my particular case the bad situation of which you have warned or advised against has not arisen and therefore I believe that I am justified in the decision that I have made.”

I will call this the “Anecdotal Dismissal of Risk” or better known as “Outcome Bias.”  Generally this phrase is used to wave off the accusation that one has either taken on unnecessary risk or taken on unnecessary financial expense or, more likely, both.  The use of an anecdote for either of these cases is, of course, completely meaningless but the speaker does so with the hope of throwing off the discussion and routing it around their case by suggesting, without saying it, that perhaps they are a special case that has not been considered or, perhaps, that “getting lucky” is a valid form of decision making.

Of course, when talking risk, we are talking about statistical risk.  If anything was a sure thing, and could be proven or disproved with an anecdote, it would not be risk but would just be a known outcome and making the wrong choice would be amazingly silly.  Anecdotes have a tiny place when using in the negative, for example: They claim that it is a billion to one chance that this would happen, but it happened to me on the third try and I know one other person that it happened to.  That’s not proof, but anecdotally it suggests that the risk figures are unlikely correct.

That case is valid, still incredibly important to realize that even negative anecdotal evidence (anecdotal evidence of something that was extremely unlikely to happen) is still anecdotal and does not suggest that the results will happen again, but at least it suggests that you were an amazing edge case.  If you know of one person that has won the lottery, that’s unlikely but doesn’t prove that the lottery is likely to be won.  If you know that every other person you know who has played the lottery has won, something is wrong with the statistics.

However, the “it worked for me” case is universally used with risk that is less than fifty percent (if it were not the whole thing would become crazy.)  Often it is about taking something four nines reliability and reducing it to three nines when attempting to raise it.  Three nines of something still means that there is only a one in one thousand chance that the bad case will arise.  This isn’t statistically likely to occur, obviously.  At least we would hope that it was obvious.  Even though, in this example, the bad case arises ten times more often than it would have it we had left well enough alone and maybe one hundred times more than how often we intended for it to arise we still expect to never see the bad outcome unless we run thousands or tens of thousands of cases and then the statistics are still based on a rather small pool.

In many cases we talk about an assumption of unnecessary risk but generally this is risk at a financial cost. What prompts this reaction a great deal of the time, in my experience, is a reaction to being demonstrated a dramatic overspending – implementing very costly solutions when a less costly one, often fractionally as expensive, may approach or, in many cases, exceed the chosen solution that is being defended.

To take the reverse, out of any one thousand people, nine hundred and ninety nine of them, doing this same thing, would be expected to have no bad outcome.  For someone to claim, then, that the risk is one part in one thousand and have one of the nine hundred and ninety nine step forward and say “the risk can’t exist because I am not the incredibly unlikely one to have had the bad thing happen to me” obviously makes no sense whatsoever when looking at the pool as a whole.  But when we are the ones who made the decision to join that pool and then came away unscathed it is an apparently natural reaction to discount the assumed outcome of even a risky choice and assume that the risk did not exist.

It is difficult to explain risk in this way but, over the years, I’ve found a really handy example to use that tends to explain business or technical risk in a way that anyone can understand.  I call it the Mother Seatbelt Example.  Try this experiment (don’t actually try it but lie to your mother and tell her that you did to see the outcome.)

Drive a car without wearing a seatbelt for a whole day while continuously speeding.  Chances are extremely good that nothing bad will happen to you (other than paying some fines.)  The chances of having a car accident and getting hurt, even while being reckless in both your driving and disregarding basic safety precautions, is extremely low.  Easily less than one in one thousand.   Now, go tell your mother what you just did and say that you feel that doing this was a smart way to drive and that you made a good decision in having done so because “it worked out for me.”  Your mother will make it very clear to you what risky decisions mean and how anecdotal evidence of expected survival outcome does not indicate good risk / reward decision making.

In many cases, “it worked for me” is an attempt at deflection.  A reaction of our amygdala in a “fight or flight” response to avoid facing what is likely a bad decision of the past.  Everyone has this reaction, it is natural, but unhealthy.  By taking this stance of avoiding critical evaluation of past decisions we make ourselves more likely to continue to repeat the same bad decision or, at the very least, continue the bad decision making process that lead to that decision.  It is only by facing critical examination and accepting that past decisions may not have been ideal that we can examine ourselves and our processes and attempt to improve them to avoid making the same mistakes again.

It is understandable that in any professional venue there is a desire to save face and appear to have made if not a good decision, at least an acceptable one and so the desire to explore logic that might undermine that impression is low.  Even moreso there is a very strong possibility that someone who is a potential recipient of the risk or cost that the bad decision created will learn of the past decision making and there is, quite often, an even stronger desire to cover up any possibility that a decision may have been made without proper exploration or due diligence.  These are understandable reactions but they are not healthy and ultimately make the decision look even poorer than it would have.  Everyone makes mistakes, everyone.  Everyone overlooks things, everyone learns new things over time.  In some cases, new evidence comes to light that was impossible to have known at the time.  There should be no shame in past decisions that are less than ideal, only in failing to examine them and learn from them allowing us as individuals as well as our organizations to grow and improve.

The phrase seems innocuous enough when said.  It sounds like a statement of success.  But we need to reflect deeper.  The risk scenario we showed above.  But what about the financial one.  When a solution is selected that carries little or no benefits, and possibly great caveats as we see in many real world cases, while being much more costly and the term “it worked for me” is used, what is really being said is “wasting money didn’t get me in trouble.”  When used in the context of a business, this is quite a statement to make.  Businesses exist to make money.  Wasting money on solutions that don’t meet the need better is a failure whether the solution functions technically or not.  Many solutions are too expensive but would not fail, choosing the right solution always involves getting the right price for the resultant situation.  That is just the nature of IT in business.

Using this phrase can sound reasonable to the irrational, defense brain.  But to outsiders looking in with rational views it actually sounds like “well, I got away with…” fill in the blank: “wasting money”, “being risky”, “not doing my due diligence”, “not doing my job”, or whatever the case may be.  And likely whatever you think should be filled in there will not be as bad as what others assume.

If you are attempted to justify past actions by saying “it worked for me” or by providing anecdotal evidence that shows nothing, stop and think carefully.  Give yourself time to calm down and evaluate your response.  Is is based on logic or irrational amygdala emotions?  Don’t be ashamed of having the reaction, everyone has it.  It cannot be escaped.  But learning how to deal with it can allow us to approach criticism and critique with an eye towards improvement rather than defense.  If we are defensive, we lose the value in peer review, which is so important to what we do as IT professionals.