Tag Archives: san

New Hyperconvergence, Old Storage

We all dream of the day that we get to build a new infrastructure from the ground up without any existing technical debt to hold us back.  A greenfield deployment where we pick what is best, roll it out fresh and enjoy.  But most of us live in the real world where that is not very realistic and what we actually face is a world where we have to plan for the future but work with what we already have, as well.

Making do with what we have is nearly an inevitable fact of life in IT and when approaching storage when moving from an existing architecture to hyperconvergence things will be no different.  In a great many cases we will be facing a situation where an existing investment in storage will be in place that we do not want to simply discard but does not necessarily fit neatly into our vision of a hyperconverged future.

There are obvious options to consider, of course, such as returning leased gear, retiring older equipment or selling still useful equipment outright.  These are viable options and should be considered.  Eliminating old gear or equipment that does not fit well into the current plans can be beneficial as we can simplify our networks, reduce power consumption and possible even recoup some degree of our investments.

In reality, however, these options are rarely financially viable and we need to make more productive use of our existing technology investments.  What options are available to us, of course, depend on a range of factors.  But we will look at some examples of how common storage devices can be re-purposed in a new hyperconverged-based system in order to maintain their utility either until they are ready to retire or even, in some cases, indefinitely.

The easiest re-purposing of existing storage, and this applies equally to both NAS and SAN in most cases, is to designate them as backup or archival targets.  Traditional NAS and SAN devices are excellent backup hardware and are generally usable by nearly any backup mechanism, regardless of approach or vendor.  And because they are generic backup targets if a mixture of backup mechanisms are used, such as agent based, agentless and custom scripts, these can all work to the same target.  Backups so rarely get the attention and investment that they deserve that this is not just the easiest but often the most valuable use of pre-existing storage infrastructure.

Of course anything that is appropriate for backups can also be used for archival storage.  Archival needs are generally less needed (only a percentage of firms need archival storage while all need backups) and are of lower priority, so this is more of an edge reuse case, but still one to consider, especially for organizations that may be working to re-purpose a large number of possibly disparate storage devices.  However it is worth noting that moving to hyperconvergence does tend to “flatten” the compute and storage space in a way that may easily introduce a value to lower performance, lower priority archival storage that may not have existed or existed so obviously prior to the rearchitecting of the environment.

NAS has the unique advantageous use cases of being usable as general purpose network storage, especially for home directories of end users.  NAS storage can be used in so many places on the network, it is very easy to continue to use, after moving core architectures.  The most popular case is for users’ own storage needs with the NAS connected directly to end user devices which allows for storage capacity, performance and network traffic to be offloaded from the converged infrastructure to the NAS.    It would actually be very rare to remove a NAS from a hyperconverged network as its potential utility is so high and apparent.

Both SAN and NAS have the potential to be attached directly to the virtual machines running on top of a hyperconverged infrastructure as well.  In this way they can continue to be utilized in a traditional manner until such time as they are no longer needed or appropriate.  While not often the recommended approach, attaching network storage to a VM directly, there are use cases for this and it allows systems to behave as they always have in a physical realm into the future.  This is especially useful for mapped drives and user directories via a NAS, much as we mentioned for end user devices, but the cases are certainly not limited to this.

A SAN can provide some much needed functionality in some cases for certain workloads that require shared block storage that otherwise is not available or exposed on a platform.  Workloads on a VM will use the SAN as they always have and not even be aware that they are virtualized or converged.  Of course we can also attach a SAN to a virtualized file server or NAS head running on our hyperconverged infrastructure if the tiering for that kind of workload is deemed appropriate as well.

Working with existing infrastructure when implementing a new one does present a challenge, but one that we can tackle with creativity and logical approach.  Storage is a nearly endless challenge and having existing storage to re-purpose may easily end up being exceptionally advantageous.

SMBs Must Stop Looking to BackBlaze for Guidance

I have to preface this article, because people often take these things out of context and react strongly to things that were never said, with the disclaimer that I think that BackBlaze does a great job, has brilliant people working for them and has done an awesome job of designing and leveraging technology that is absolutely applicable and appropriate for their needs.  Nothing, and I truly mean nothing, in this article is ever to be taken out of context and stated as a negative about BackBlaze.  If anything in this article appears or feels to state otherwise, please reread and confirm that such was said and, if so, inform me so that I may correct it.  There is no intention of this article to imply, in any way whatsoever, that BackBlaze is not doing what is smart for them, their business and their customers.  Now on to the article:

I have found over the past many years that many small and medium business IT professionals have become enamored by what they see as a miracle of low cost, high capacity storage in what is know as the BackBlaze POD design.  Essentially the BackBlaze POD is a low cost, high capacity, low performance nearly whitebox storage server built from a custom chassis and consumer parts to make a disposable storage node used in large storage RAIN arrays leveraging erasure encoding.  BackBlaze custom designed the original POD, and released its design to the public, for exclusive use in their large scale hosted backup datacenters where the PODs functions as individual nodes within a massive array of nodes with replication between them.  Over the years, BackBlaze has updated its POD design as technology has changed and issues have been addressed.  But the fundamental use case has remained the same.

I have to compare this to the SAM-SD approach to storage which follows a similar tact but does so using enterprise grade, supported hardware.  These differences sometimes come off as trivial, but they are anything but trivial, they are key underpinnings to what makes the different solutions appropriate in different situations.  The idea behind the SAM-SD is that storage needs to be reliable and designed from the hardware up to be as reliable as possible and well supported for when things fail.  The POD takes the opposite approach making the individual server unreliable and ephemeral in nature and designed to be disposed of rather than repaired at all.  The SAM-SD design assumes that the individual server is important, even critical – anything but disposable.

The SAM-SD concept, which is literally nothing more than an approach to building open storage, is designed with the SMB storage market in mind.  The BackBlaze POD is designed with an extremely niche, large scale, special case consumer backup market in mind.  The SAM-SD is meant to be run by small businesses, even those without internal IT.  The POD is designed to be deployed and managed by a full time, dedicated storage engineering team.

Because the BackBlaze POD is designed by experts, for experts in the very largest of storage situations it can be confusing and easily misunderstood by non-storage experts in the small business space.  In fact, it is so often misunderstood that objections to it are often met with “I think BackBlaze knows what they are doing” comments, which demonstrates the extreme misunderstanding that exists with the approach.  Of course BackBlaze knows what they are doing, but they are not doing what any SMB is doing.

The release of the POD design to the public causes much confusion because it is only one part of a greater whole.  The design of the full data center and the means of redundancy and mechanisms for such between the PODs is not public, but is proprietary.  So the POD itself represents only a single node of a cluster (or Vault) and does not reflect the clustering itself, which is where the most important work takes place.  In fact the POD design itself is nothing more than the work done by the Sun Thumper and SAM-SD projects of the past decade without the constraints of reliability.  The POD should not be a novel design, but an obvious one.  One that has for decades been avoided in the SMB storage space because it is so dramatically non-applicable.

Because the clustering and replication aspects are ignored when talking about the BackBlaze POD some huge assumptions tend to be made about the capacity of a POD that has much lower overhead than BackBlaze themselves get for the POD infrastructure, even at scale.  For example, in RAID terms, this would be similar to assuming that the POD is RAID 6 (with only 5% overhead) because that is the RAID of an individual component when, in fact, RAID 61 ( 55% overhead) is used!  In fact, many SMB IT Professionals when looking to use a POD design actually consider simply using RAID 6 in addition to only using a single POD.  The degree to which this does not follow BackBlaze’s model is staggering.

BackBlaze: “Backblaze Vaults combine twenty physical Storage Pods into one virtual chassis. The Vault software implements our own Reed-Solomon encoding to spread data shards across all twenty pods in the Vault simultaneously, dramatically improving durability.

To make the POD a consideration for the SMB market it is required that the entire concept of the POD be taken completely out of context.  Both its intended use case and its implementation.  What makes BackBlaze special is totally removed and only the most trivial, cursory aspects are taken and turned into something that in no way resembles the BackBlaze vision or purpose.

Digging into where the BackBlaze POD is differing in design from the standard needs of a normal business we find these problems:

  • The POD is designed to be unreliable, to rely upon a reliability and replication layer at the super-POD level that requires a large number of PODs to be deployed and for data to be redundant between them by means of custom replication or clustering.  Without this layer, the POD is completely out of context.  The super-POD level is known internally as the BackBlaze Vault.
  • The POD is designed to be deployed in an enterprise datacenter with careful vibration dampening, power conditioning and environmental systems.  It is less resilient to these issues as standard enterprise hardware.
  • The POD is designed to typically be replaced as a complete unit rather than repairing a node in situ.  This is the opposite of standard enterprise hardware with hot swap components designed to be repaired without interruption, let alone without full replacement.  We call this a disposable or ephemeral use case.
  • The POD is designed to be incredibly low cost for very slow, cold storage needs.  While this can exist in an SMB, typically it does not.
  • The POD is designed to be a single, high capacity storage node in a pool of insanely large capacity.  Few SMBs can leverage even the storage potential of a single POD let alone a pool large enough to justify the POD design.
  • The BackBlaze POD is designed to use custom erasure encoding, not RAID.  RAID is not effective at this scale even at the single POD level.
  • An individual POD is designed for 180TB of capacity and a Vault scale of 3.6PB.

Current reference of the BackBlaze POD 5: https://www.backblaze.com/blog/cloud-storage-hardware/

In short, the BackBlaze POD is a brilliant underpinning to a brilliant service that meets a very specific need that is as far removed from the needs of the SMB storage market as one can reasonably be.  Respect BackBlaze for their great work, but do not try to emulate it.


Drive Appearance

One of the more common, yet more tricky fundamental concepts in computing today is the concept of drive appearance or, in other words, something that appears to be a hard drive.  This may sound simple, and mostly it is, but it can be tricky.

First, what is a hard drive.  This should be simple.  We normally mean a traditional spinning disk Winchester device such have been made for decades in standard three and a half inch as well as two and a half inch form factors.  They contain platters that spin, a drive head that moves forward and backward and they connect using something like ATA or SCSI connectors.  Most of us can pick up a hard drive with our hands and be certain that we have a hard drive.  This is what we call the physical manifestation of the drive.

To the computer, though, it does not see the casing of the drive nor the connectors.  The computer has to look through its electronics and “see” the drive digitally.  This is very, very different from how humans view the physical drive.  To the computer, a hard drive appears as an ATA, SCSI or Fibre Channel device at the most basic physical level and are generally abstracted at a higher level as a block device.  This is what we would call a logical appearance, rather than a physical one.  For our purposes here, we will think of all of these drive interfaces as being block devices.  They do differ, but only slightly and not consequentially to the discussion.  What is important is that there is a standard interface or set of closely related interfaces that are seen by the computer as being a hard drive.

Another way to think of the logical drive appearance here is that anything that looks like a hard drive to the computer is something on which the computer format with a filesystem.  Filesystems are not drives themselves, but require a drive on which to be placed.

The concept of the interface is the most important one here.  To the computer, it is “anything that implements a hard drive interface” that is truly seen as being a hard drive.  This is both a simple as well as a powerful concept.

It is because of the use of a standard interface that we were able to take flash memory, attach it to a disk controller that would present it over a standard protocol (both SATA and SAS implementations of ATA and SCSI are common for this today) and create SSDs that look and act exactly like traditional Winchester drives to the computer yet have nothing physically in common with them.  They may or may not come in a familiar physical form factor, but they definitely lack platters and a drive head.  Looking at the workings of a traditional hard drive and a modern SSD we would not guess that they share a purpose.

This concept applies to many devices.  Obviously SD cards and USB memory sticks work in the same way.  But importantly, this is how partitions on top of hard drives work.  The partitioning system uses the concept of drive impression interface on one side to be able to be applied to a device, and on the other side it presents a drive impression interface to whatever wants to use it; normally a filesystem.  This idea of something that using the drive impression interface on both sides is very important.  By doing this, we get a uniform and universal building block system for making complex storage systems!

We see this concept of “drive in; drive out” in many cases.  Probably the best know is RAID.  A RAID system takes an array of hard drives, applies one of a number of algorithms to make the drives act as a team, and then present them as a single drive impression to the next system up the “stack.”  This encapsulation is what gives RAID its power: systems further up the stack looking at a RAID array see literally a hard drive.  They do not see the array of drives, they do not know what is below the RAID.  They just see the resulting drive(s) that the RAID system present.

Because a RAID system takes an arbitrary number of drives and presents them as a standard drive we have the theoretical ability to layer RAID as many times as we want.  Of course this would be extremely impractical to do to any great degree.  But it is through this concept that nested RAID arrays are possible.   For example, if we had many physical hard drives split into pairs and each pair in a RAID 1 array.  Each of those resulting arrays gets presented as a single drive.  Each of those resulting logical drives can be combined into another RAID array, such as RAID 0.  Doing this is how RAID 10 is built.  Going further we could take a number of RAID 10 arrays, present them all to another RAID system that puts them in RAID 0 again and get RAID 100 and so forth indefinitely.

Similarly the logical volume layer uses the same kind of encapsulation as RAID to work its magic.  Logical Volume Managers, such as LVM on Linux and Dynamic Disks on Windows, sit on top of logical disks and provide a layer where you can do powerful management such as flexibly expanding devices or enabling snapshots, and then present logical disks (aka drive impression interface) to the next layer of the stack.

Because of the uniform nature of drive impressions the stack can happen in any order.  A logical volume manager can sit on top of RAID, or RAID can sit on top of a logical volume manager and of course you can skip or the other or both!

The concept of drive impressions or logical hard drives is powerful in its simplicity and allows us great potential for customizing storage systems however we need to make them.

Of course there are other uses of the logical drive concept as well.  One of the most popular and least understood is that of a SAN.  A SAN is nothing more than a device that takes one or more physical disks and presents them as logical drives (this presentation of a logical drive from a SAN is called a LUN) over the network.  This is, quite literally, all that a SAN is.  Most SANs will incorporate a RAID layer and likely a logical volume manager layer before presenting the final LUNs, or disk impressions, to the network, but that is not required to be a SAN.

This means, of course, that multiple SAN LUNs can be combined in a single RAID or controlled via a logical volume layer.  And of course it means that a SAN LUN, a physical hard drive, a RAID array, a logical volume, a partition…. can all be formatted with a filesystem as they are all different means of achieving the same result.  They all behave identically.  They all share the drive appearance interface.

To give a real world example of how you would often see all of these parts come together we will examine one of the most common “storage stacks” that you will find in the enterprise space.  Of course there are many ways to build a storage stack so do not be surprised if yours is different.  At the bottom of the stack is nearly always physical hard drives, which could include solid state drives.  This are located physically within a SAN.  Before leaving the SAN the stack will likely include the actual storage layer of the drives, then a RAID layer combining those drives into a single entity.  Then a logical volume layer to allow for features like growth and snapshots.  Then there is the physical demarcation between the SAN and the server which is presented as the LUN.  The LUN then has a logical volume manager applies to it on the server / operating system side of the demarcation point.  Then on top of that LUN is a filesystem which is our final step as the filesystem does not continue to present a drive appearance interface but a file interface, instead.

Understanding drive appearance, or logical drives, and how these allows components to interface with each other to build complex storage subsystems is a critical building block to IT understanding and is widely applicable to a large number of IT activities.

The Emperor’s New Storage

We all know the story of the Emperor’s New Clothes.  In Hans Christian Anderson’s telling of the classic tale we have some unscrupulous cloth vendors who convince the emperor that they have clothes made from a fabric with the magical property of only being visible to people who are fit for their positions.  The emperor, not being able to see the clothes, decides to buy them because he fears people finding out that he cannot see them.  Everyone in the kingdom pretends to see them as well – all sharing the same fear.  It is a brilliant sales tactic because it puts everyone on the same team: the cloth sellers, the emperor, the people in the street all share a common goal that requires them to all maintain the same lie.  Only when a little boy who cares naught about his status in society but only about the truth points out that the emperor is naked is everyone free to admit that they don’t see the clothes either.

And this brings us to the storage market today.  Today we have storage vendors desperate to sell solutions of dubious value and buyers who often lack the confidence in their own storage knowledge to dare to question the vendors in front of management or who simply have turned to vendors to make their IT decisions on their behalf.  This has created a scenario where the vendor confidence and industry uncertainty has engendered market momentum causing the entire situation to snowball.  The effect is that using big, monolithic and expensive storage systems is so accepted today that often systems are purchased without any thought at all.  They are essentially a foregone conclusion!

It is time for someone to point at the storage buying process and declare that the emperor is, in fact, naked.

Don’t get me wrong.  I certainly do not mean to imply that modern storage solutions do not have value.  Most certainly they do.  Large SAN and NAS shared storage systems have driven much technological development and have excellent use cases.  They were not designed without value, but they do not apply to every scenario.

The idea of the inverted pyramid design, the overuse of SANs where they do not apply, came about because they are high profit margin approaches.  Manufacturers have a huge incentive to push these products and designs because they do much to generate profits.  SANs are one of the most profit-bearing products on the market.  This, in turn, incentivizes resellers to push SANs as well, both to generate profits directly through their sales but also to keep their vendors happy.  This creates a large amount of market pressure by which everyone on the “sales” side of the buyer / seller equation has massive pressure to convince you, the buyer, that a SAN is absolutely necessary.  This is so strong of a pressure, the incentives so large, that even losing the majority of potential customers in the process is worth it because the margins on the one customer that goes with the approach is generally worth losing many others.

Resellers are not the only “in between” players with incentive to see large, complex storage architectures get deployed.  Even non-reseller consultants have an incentive to promote this approach because it is big, complex and requires, on average, far more consulting and support than do simpler system designs.  This is unlikely to be a trivial number.  Instead of a ten hour engagement, they may win a hundred hours, for example, and for consultants those hours are bread and butter.

Of course, the media has incentive to promote this, too.  The vendors provide the financial support for most media in the industry and much of the content.  Media outlets want to promote the design because it promotes their sponsors and they also want to talk about the things that people are interested in and simple designs do not generate a lot of readership.  The same problems that exist with sensationalist news: the most important or relevant news is often skipped so that news that will gather viewership is shown instead.

This combination of factors is very forceful.  Companies that look to consultants, resellers and VARs, and vendors for guidance will get a unanimous push for expensive, complex and high margin storage systems.  Everyone, even the consultants who are supposed to be representing the client have a pretty big incentive to let these complex designs get approved because there is just so much money potentially sitting on the table.  You might get paid one hour of consulting time to recommend against overspending, but might be paid hundreds of hours for implementing and supporting the final system.  That’s likely tens of thousands of dollars difference, a lot of incentive, even for the smallest deployments.

This unification of the sales channel and even the front line of “protection” has an extreme effect.  Our only real hope, the only significant one, for someone who is not incentivized to participate in this system is the internal IT staff themselves.  And yet we find very rarely that internal staff will stand up to the vendors on these recommendations or even produce them themselves.

There are many reasons why well intentioned internal IT staff (and even external ones) may fail to properly assess needs such as these.  There are a great many factors involved and I will highlight some of them.

  • Little information in the market.  Because no company makes money by selling you less, there is almost no market literature, discussions or material to assist in evaluating decisions.  Without direct access to another business that has made the same decision or to any consultants or vendors promoting an alternative approach, IT professionals are often left all alone.  This lack of supporting experience is enough to cause adequate doubt to squash dissenting voices.
  • Management often prefers flashy advertising and the word of sales people over the opinions of internal staff.  This is a hard fact, but one that is often true.  IT professionals often face the fact that management may make buying decisions without any technical input whatsoever.
  • Any bid process immediately short circuits good design.  A bid would have to include “storage” and SAN vendors can easily bid on supplying storage while there is no meaningful way for “nothing” to bid on it.  Because there is no vendor for good design, good design has no voice in a bidding or quote based approach.
  • Lack of knowledge.  Often dealing with system architecture and storage concerns are one off activities only handled a few times over an entire career.  Making these decisions is not just uncommon, it is often the very first time that it has ever been done.  Even if the knowledge is there, the confidence to buck the trend easily is not.
  • Inexperience in assessing risk and cost profiles.  While these things may seem like bread and butter to IT management, often the person tasked with dealing with system design in these cases will have no training and no experience in determining comparative cost and risk in complex systems such as these.  It is common that risk goes unidentified.
  • Internal staff often see this big and costly purchase as a badge of honour or a means to bragging rights.  Excited to show off how much they were able to spend and how big their new systems are.  Everyone loves gadgets and these are often the biggest, most expensive toys that we ever touch in our industry.
  • Internal staff often have no access to work with equipment of this type, especially SANs.  Getting a large storage solution in house may allow them to improve their resume and even leverage the experience into a raise or, more likely, a new job.
  • Turning to other IT professionals who have tackled similar situations often results in the same advice as from sales people.  This is for several reasons.  All of the reasons above, of course, would have applied to them plus one very strong one – self preservation.  Any IT professional that has implemented a very costly system unnecessarily will have a lot of incentive to state that they believe that the purchase was a good one.  Whether this is irrational “reverse rationalization” – the trait where humans tend to apply ration to a decision that lacked ration when originally made, because they fear that their job may be in jeopardy if it was found out what they had done or because they have not assessed the value of the system after implementation; or even possibly because their factors were not the same as yours and the design was applicable to their needs.

The bottom line is that basically everyone, no matter what role they play, from vendors to sales people to those that do implementation and support to even your friends in similar job roles to strangers on Internet forums, all have big incentives to promote costly and risky storage architectures in the small and medium business space.  There is, for all intents and purposes, no one with a clear benefit for providing a counter point to this marketing and sales momentum.  And, of course, as momentum has grown the situation becomes more and more entrenched with people even citing the questioning of the status quo and asking critical questions as irrational or reckless.

As with any decision in IT, however, we have to ask “does this provide the appropriate value to meet the needs of the organization?”  Storage and system architectural design is one of the most critical and expensive decisions that we will make in a typical IT shop.  Of all of the things that we do, treating this decision as a knee-jerk, foregone conclusion without doing due diligence and not looking to address our company’s specific goals could be one of the most damaging that we make.

Bad decisions in this area are not readily apparent.  The same factors that lead to the initial bad decisions will also hide the fact that a bad decision was made much of the time.  If the issue is that the solution carries too much risk, there is no means to determine that better after implementation than before – thus is the nature of risk.  If the system never fails we don’t know if that is normal or if we got lucky.  If it fails we don’t know if this is common or if we were one in a million.  So observation of risk from within a single implementation, or even hundreds of implementations, gives us no statistically meaningful insight.  Likewise when evaluating wasteful expenditures we would have caught a financial waste before the purchase just as easily as after it.  So we are left without any ability for a business to do a post mortem on their decision, nor is there an incentive as no one involved in the process would want to risk exposing a bad decision making process.  Even companies that want to know if they have done well will almost never have a good way of determining this.

What makes this determination even harder is that the same architectures that are foolish and reckless for one company may be completely sensible for another.  The use of a SAN based storage system and a large number of attached hosts is a common and sensible approach to controlling costs of storage in extremely large environments.  Nearly every enterprise will utilize this design and it normally makes sense, but is used for very different reasons and goals than apply to nearly any small or medium business.  It is also, generally, implemented somewhat differently.  It is not that SANs or similar storage are bad.  What is bad is allowing market pressure, sales people and those with strong incentives to “sell” a costly solution to drive technical decision making instead of evaluating business needs, risk and cost analysis and implementing the right solution for the organization’s specific goals.

It is time that we, as an industry, recognize that the emperor is not wearing any clothes.  We need to be the innocent children who point, laugh and question why no one else has been saying anything when it is so obvious that he is naked.  The storage and architectural solutions so broadly accepted benefit far too many people and the only ones who are truly hurt by them (business owners and investors) are not in a position to understand if they do or do not meet their needs.  We need to break past the comfort provided by socially accepted plausible deniability or understanding, or culpability for not evaluating.  We must take responsibility for protecting our organizations and provide solutions that address their needs rather than the needs of the sales people.


For more information see: When to Consider a SAN and The Inverted Pyramid of Doom