You Can’t Virtualize That!

We get this all of the time in IT, a vendor tells us that a system cannot be virtualized.  The reasons are numerous.  On the IT side, we are always shocked that a vendor would make such an outrageous claim; and often we are just as shocked that a customer (or manager) believes them.  Vendors have worked hard to perfect this sales pitch over the years and I think that it is important to dissect it.

The root cause of problems is that vendors are almost always seeking ways to lower costs to themselves while increasing profits from customers.  This drives a lot of what would otherwise be seen as odd behaviour.

One thing that many, many vendors attempt to do is limit the scenarios under which their product will be supported.  By doing this, they set themselves up to be prepared to simply not provide support – support is expensive and unreliable.  This is a common strategy.  It some cases, this is so aggressive that any acceptable, production deployment scenario fails to even exist.

A very common means of doing this is to fail to support any supported operating system, de facto deprecating the vendor’s own software (for example, today this would mean only supporting Windows XP and earlier.)  Another example is only supporting products that are not licensed for the use case (an example would be requiring the use of a product like Windows 10 be used as a server.)  And one of the most common cases is forbidding virtualization.

These scenarios put customers into difficult positions because on one hand they have industry best practices, standard deployment guidelines, in house tooling and policies to adhere to; and on the other hand they have vendors often forbidding proper system design, planning and management.  These needs are at odds with one another.

Of course, no one expects every vendor to support every potential scenario.   Limits must be applied.  But there is a giant chasm between supporting reasonable, well deployed systems and actively requiring unacceptably bad deployments.  We hope that our vendors will behave as business partners and share a common interest in our success or, at the very least, the success of their product and not directly seek to undermine both of these causes.  We would hope that, at a very minimum, best effort support would be provided for any reasonable deployment scenario and that guaranteed support would be likely offered for properly engineered, best practice scenarios.

Imagine a world where driving the speed limit and wearing a seatbelt would violate your car warranty and that you would only get support if you drove recklessly and unprotected!

Some important things need to be understood about virtualization.  The first is that virtualization is a long standing industry best practice and is expected to be used in any production deployment scenario for services.  Virtualization is in no way new, even in the small business market it has been in the best practice category for well over a decade now and for many decades in the enterprise space.  We are long past the point where running systems non-virtualized is considered acceptable, and that includes legacy deployments that have been in place for a long time.

There are, of course, always rare exceptions to nearly any rule.  Some systems need access to very special case hardware and virtualization may not be possible, although with modern hardware passthrough this is almost unheard of today.  And some super low latency systems cannot be virtualized but these are normally limited to only the biggest international investment banks and most aggressive hedgefunds and even the majority of those traditional use cases have been eliminated by improvements in virtualization making even those situations rare.  But the bottom line is, if you can’t virtualize you should be sad that you cannot, and you will know clearly why it is impossible in your situation.  In all other cases, your server needs to be virtual.

Is it not important?

If a vendor does not allow you to follow standard best practices for healthy deployments, what does this say about the vendor’s opinion of their own product?  If we were talking about any other deployment, we would immediately question why we were deploying a system so poorly if we plan to depend on it.  If our vendor forces us to behave this way, we should react in the same manner – if the vendor doesn’t take the product to the same degree that we take the least of our IT services, why should we?

This is an “impedance mismatch”, as we say in engineering circles, between our needs (production systems) and how the vendor making that system appears to treat them (hobby or entertainment systems.)  If we need to depend on this product for our businesses, we need a vendor that is on board and understands business needs – has a production mind set.  If the product is not business targeted or business ready, we need to be aware of that.  We need to question why we feel we should be using a service in production, on which we depend and require support, that is not intended to be used in that manner.

Is it supported?  Is it being tested?

Something that is often overlooked from the perspective of customers is whether or not the necessary support resources for a product are in place.  It’s not uncommon for the team that supports a product to become lean, or even disappear, but the company to keep selling the product in the hopes of milking it for as much as they can and bank on either muddling through a problem or just returning customer funds should the vendor be caught in a situation where they are simply unable to support it.

Most software contracts state that the maximum damage that can be extracted from the vendor is the cost of the product, or the amount spent to purchase it.  In a case such as this, the vendor has no risk from offering a product that they cannot support – even if charging a premium for support.  If the customer manages to use the product, great they get paid. If the customer cannot and the vendor cannot support it, they only lose money that they would never have gotten otherwise.  The customer takes on all the risk, not the vendor.

This suggests, of course, that there is little or no continuing testing of the product as well, and this should be of additional concern.  Just because the product runs does not mean that it will continue to run.  Getting up and running with an unsupported, or worse unsupportable, product means that you are depending more and more over time on a product with a likely decreasing level of potential support, slowly getting worse over time even as the need for support and the dependency on the software would be expected to increase.

If a proprietary product is deployed in production, and the decision is made to forgo best practice deployments in order to accommodate support demands, how can this fit in a decision matrix? Should this imply that proper support does not exist? Again, as before, this implies a mismatch in our needs.


Is It Still Being Developed?

If the deployment needs of the software follow old, out of date practices, or require out of date (or not reasonably current software or design) then we have to question the likelihood that the product is currently being developed.  In some cases we can determine this by watching the software release cycle for some time, but not in all cases.  There is a reasonable fear that the product may be dead, with no remaining development team working on it.  The code may simply be old, technical debt that is being sold in the hopes of making a last, few dollars off of an old code base that has been abandoned.  This process is actually far more common than is often believed.

Smaller software shops often manage to develop an initial software package, get it on the market and available for sale, but fail to be able to afford to retain or restaff their development team after initial release(s).  This is, in fact, a very common scenario.  This leaves customers with a product that is expected to become less and less viable over time with deployment scenarios becoming increasingly risky and data increasing hard to extricate.


How Can It Be Supported If the Platform Is Not Supported?

A common paradox of some more extreme situations is software that, in order to qualify as “supported”, requires other software that is either out of support or was never supported for the intended use case.  Common examples of this are requiring that a server system be run on top of a desktop operating system or requiring versions of operating systems, databases or other components, that are no longer supported at all.  This last scenario is scarily common.  In a situation like this, one has to ask if there can ever be a deployment, then, where the software can be considered to be “supported”?  If part of the stack is always out of support, then the whole stack is unsupported.  There would always be a reason that support could be denied no matter what.   The very reason that we would therefore demand that we avoid best practices would equally rule out choosing the software itself in the first place.

Are Industry Skills and Knowledge Lacking?

Perhaps the issue that we face with software support problems of this nature are that the team(s) creating the software simply do not know how good software is made and/or how good systems are deployed.  This is among the most reasonable and valid reasons for what would drive us to this situation.  But, like the other hypothesis reasons, it leaves us concerned about the quality of the software and the possibility that support is truly available.  If we can’t trust the vendor to properly handle the most visible parts of the system, why would we turn to them as our experts for the parts that we cannot verify?

The Big Problem

The big, overarching problem with software that has questionable deployment and maintenance practice demands in exchange for unlocking otherwise withheld support is not, as we typically assume a question of overall software quality, but one of viable support and development practices.  That these issues suggest a significant concern for long term support should make us strongly question why we are choosing these packages in the first place while expecting strong support from them when, from the onset, we have very visible and very serious concerns.

There are, of course, cases where no other software products exist to fill a need or none of any more reasonable viability.  This situation should be extremely rare and if such a situation exists should be seen as a major market opportunity for a vendor looking to enter that particular space.

From a business perspective, it is imperative that the technical infrastructure best practices not be completely ignored in exchange for blind or nearly blind following of vendor requirements that, in any other instance, would be considered reckless or unprofessional. Why do we so often neglect to require excellence from core products on which our businesses depend in this way?  It puts our businesses at risk, not just from the action itself, but vastly moreso from the risks that are implied by the existence of such a requirement.

The Commoditization of Architecture

I often talk about the moving “commodity line”, this line affects essentially all technology, including designs.  Essentially, when any new technology comes out it will start highly proprietary, complex and expensive.  Over time the technology moves towards openness, simplicity and becomes inexpensive.  At some point any given technology becomes goes so far in that direction that it falls over the “commodity” line where it moves from being unique and a differentiator to becoming a commodity and accessible to essentially everyone.

Systems architecture is no different from other technologies in this manner, it is simply a larger, less easily defined topic.  But if we look at systems architecture, especially over the last few decades, we can easily system servers, storage and complete systems moving from the highly proprietary towards the commodity.  Systems were complex and are becoming simple, they were expensive and are becoming inexpensive, they were proprietary and they are becoming open.

Traditionally we dealt with systems that were physical operating systems on bare metal hardware.  But virtualization came along and abstracted this.  Virtualization gave us many of the building blocks for systems commonidization.  Virtualization itself commoditized very quickly and today we have a market flush with free, open and highly enterprise hypervisors and toolsets making virtualization totally commoditized even several years ago.

Storage moved in a similar manner.  First there was independent local storage.  Then the SAN revolution of the 1990s brought us power through storage abstraction and consolidation.  Then the replicated local storage movement moved that complex and expensive abstraction to a more reliable, more open and more simple state.

Now we are witnessing this same movement in the orchestration and management layers of virtualization and storage.  Hyperconvergence is currently taking the majority of systems architectural components and merging them into a cohesive, intelligent singularity that allows for a reduction in human understanding and labour while improving system reliability, durability and performance.  The entirety of the systems architecture space is moving, quite rapidly, toward commoditization.  It is not fully commoditized yet, but the shift is very much in motion.

As in any space, it takes a long time for commoditization to permeate the market.  Just because systems have become commoditized does not mean that non-commodity remnants will not remain in use for a long time to come or that niche proprietary (non-commodity) aspects will not linger on.  Today, for example, systems architecture commoditization is highly limited to the SMB market space as there are effective upper bound limits to hyperconvergence growth that have yet to be tackled, but over time they will be tackled.

What we are witnessing today is a movement from complex to simple within the overall architecture space and we will continue to witness this for several years as the commodity technologies mature, expand, prove themselves, become well known, etc.  The emergence of what we can tell will be commodity technologies has happened but the space has not yet commoditized.  It is an interesting moment where we have what appears to be a very clear vision of the future, some scope in which we can realize its benefits today, a majority of systems and thinking that reside in the legacy proprietary realm and a mostly clear path forward as an industry both in technology focus as well as in education, that will allow us to commoditize more quickly.

Many feel that systems are becoming overly complex, but the opposite is true. Virtualization, modern storage systems, cloud and hyperconverged orchestration layers are all coming together to commoditize first individual architectural components and then architectural design as a whole.  The move towards simplicity, openness and effectiveness is happening, is visible and is moving at a very healthy pace.  The future of systems architecture is one that clearly is going to free IT professionals from spending so much time thinking about systems design and more time thinking about how to drive competitive advantage to their individual organizations.

No One Ever Got Fired For Buying…

It was the 1980s when I first heard this phrase in IT and it was “no one ever got fired for buying IBM.”  The idea was that IBM was so well known, trusted and reliable that it was the safe choice as a vendor for a technology decision maker to select.  As long as you chose IBM, you were not going to get in trouble, no matter how costly or effective the resulting solution turned out to be.

The statement on its own feels like a simple one.  It makes for an excellent marketing message and IBM, understandably, loved it.  But it is what is implied by the message that causes so much concern.

First, we need to understand what the role of the IT decision maker in question is.  This might sound simple, but it is surprising how easily it can be overlooked.  Once we delve into the ramifications of the statement itself, it is far too easy to lose track of the real goals. In the role of a decision maker, the IT professional is tasked with selecting the best solution for their organization based on its ability to meet organizational goals (normally profits).  This means evaluating options, shielding non-technical management from sales people and marketing, understanding the marketplace, research and careful evaluation.  These things seem obvious, until we begin to put things into practice.

What we have to then analyze is not that “no one ever got fired for choosing product X”, but what the ramifications of such a statement actually are.

First, the statement implies an organization that is going to judge IT decision making not on its merits or applicability but on the brand name recognition of the product maker.  In order for a statement like this to have any truth behind it, it requires the entire organization to either lack the ability or desire to evaluate decisions but also an organizational desire to see large, expensive brand names (the statement is always made in conjunction with extremely high cost items compared to the alternatives) over other alternatives.  An organizational preference towards expensive, harder to justify spends is a dangerous one at best.  We assume that not only does buying the most expensive, most famous products will be judged well compared to less expensive or less well known ones, but that buying products is seen as beneficial to not buying products; even though often the best IT decisions are to not buy things when no need exists.  Prioritizing spending over savings for their own reasons without consideration for the business need is very bad, indeed.

Second, now that we realize the organizational reality that this implies, that the IT decision maker is willing to seize this opportunity to leverage corporate politics as a means of avoiding taking the time and effort to make a true assessment of needs for the business but rather skip this process, possibly completely, we have a strong question of ethics.  Essentially, whether out of fear of the organization not properly evaluating the results  or by blaming the decision maker for unforeseeable events after the fact or of looking to take advantage of the situation to be paid for a job that was not done, we have a significant problem either individually, organizationally, or both.

For any IT decision maker to use this mindset, one that there is safety in a given decision regardless of suitability, there has to be a fundamental distrust of the organization.  Whether this is true of the organization or not is not known, but that the IT decision maker believes it is required for such a thought to even exist.  In many organizations it is understandable that politics trump good decision making and it is far more important to make decisions for which you cannot be blamed rather than trying honestly to do a good job.  That is sad enough on its own, but so often it is simply an opportunity to skip the very job for which the IT decision maker is hired and paid and instead of doing a difficult job that requires deep business and technical knowledge, market research, cost analysis and more – simply allowing a vendor to sell whatever they want to the business.

At best, it would seem, we have an IT decision maker with little to no faith in the ethics or capabilities of those above them in the organization.  At worst we have someone actively attempting to take advantage of a business by being paid to be a key decision maker while, instead of doing the job for which they are hired or even doing nothing at all, actively putting their weight behind a vendor that was not properly evaluated based possibly solely on not needing to do any of the work themselves.

What should worry an organization is not that vendors that could often be considered “safe” get recommended or selected, but rather why they were selected.  Vendors that fall into this category often offer many great products and solutions or they would not earn this reputation in the first place.  But likewise, after gaining such a reputation those same vendors have a strong financial incentive to take advantage of this culture and charge more while delivering less as they are not being selected, in many cases, on their merits but instead on their name, reputation or marketing prowess.

How does an organization address this effect?  There are two ways.  One is to evaluate all decisions carefully in a post mortem structure to understand what good decisions look like and not limit post mortems to obviously failed projects.  The second is to look more critical, rather than less critically, at popular product and solution decisions as these are red flags that decision making may be being skipped or undertaken with less than the appropriate rigor.  Popular companies, assumed standard approaches, solutions found commonly in advertising or commonly recommended by sales people, resellers, and vendors should be looked at with a discerning eye, moreso than less common, more politically “risky” choices.


SMBs Must Stop Looking to BackBlaze for Guidance

I have to preface this article, because people often take these things out of context and react strongly to things that were never said, with the disclaimer that I think that BackBlaze does a great job, has brilliant people working for them and has done an awesome job of designing and leveraging technology that is absolutely applicable and appropriate for their needs.  Nothing, and I truly mean nothing, in this article is ever to be taken out of context and stated as a negative about BackBlaze.  If anything in this article appears or feels to state otherwise, please reread and confirm that such was said and, if so, inform me so that I may correct it.  There is no intention of this article to imply, in any way whatsoever, that BackBlaze is not doing what is smart for them, their business and their customers.  Now on to the article:

I have found over the past many years that many small and medium business IT professionals have become enamored by what they see as a miracle of low cost, high capacity storage in what is know as the BackBlaze POD design.  Essentially the BackBlaze POD is a low cost, high capacity, low performance nearly whitebox storage server built from a custom chassis and consumer parts to make a disposable storage node used in large storage RAIN arrays leveraging erasure encoding.  BackBlaze custom designed the original POD, and released its design to the public, for exclusive use in their large scale hosted backup datacenters where the PODs functions as individual nodes within a massive array of nodes with replication between them.  Over the years, BackBlaze has updated its POD design as technology has changed and issues have been addressed.  But the fundamental use case has remained the same.

I have to compare this to the SAM-SD approach to storage which follows a similar tact but does so using enterprise grade, supported hardware.  These differences sometimes come off as trivial, but they are anything but trivial, they are key underpinnings to what makes the different solutions appropriate in different situations.  The idea behind the SAM-SD is that storage needs to be reliable and designed from the hardware up to be as reliable as possible and well supported for when things fail.  The POD takes the opposite approach making the individual server unreliable and ephemeral in nature and designed to be disposed of rather than repaired at all.  The SAM-SD design assumes that the individual server is important, even critical – anything but disposable.

The SAM-SD concept, which is literally nothing more than an approach to building open storage, is designed with the SMB storage market in mind.  The BackBlaze POD is designed with an extremely niche, large scale, special case consumer backup market in mind.  The SAM-SD is meant to be run by small businesses, even those without internal IT.  The POD is designed to be deployed and managed by a full time, dedicated storage engineering team.

Because the BackBlaze POD is designed by experts, for experts in the very largest of storage situations it can be confusing and easily misunderstood by non-storage experts in the small business space.  In fact, it is so often misunderstood that objections to it are often met with “I think BackBlaze knows what they are doing” comments, which demonstrates the extreme misunderstanding that exists with the approach.  Of course BackBlaze knows what they are doing, but they are not doing what any SMB is doing.

The release of the POD design to the public causes much confusion because it is only one part of a greater whole.  The design of the full data center and the means of redundancy and mechanisms for such between the PODs is not public, but is proprietary.  So the POD itself represents only a single node of a cluster (or Vault) and does not reflect the clustering itself, which is where the most important work takes place.  In fact the POD design itself is nothing more than the work done by the Sun Thumper and SAM-SD projects of the past decade without the constraints of reliability.  The POD should not be a novel design, but an obvious one.  One that has for decades been avoided in the SMB storage space because it is so dramatically non-applicable.

Because the clustering and replication aspects are ignored when talking about the BackBlaze POD some huge assumptions tend to be made about the capacity of a POD that has much lower overhead than BackBlaze themselves get for the POD infrastructure, even at scale.  For example, in RAID terms, this would be similar to assuming that the POD is RAID 6 (with only 5% overhead) because that is the RAID of an individual component when, in fact, RAID 61 ( 55% overhead) is used!  In fact, many SMB IT Professionals when looking to use a POD design actually consider simply using RAID 6 in addition to only using a single POD.  The degree to which this does not follow BackBlaze’s model is staggering.

BackBlaze: “Backblaze Vaults combine twenty physical Storage Pods into one virtual chassis. The Vault software implements our own Reed-Solomon encoding to spread data shards across all twenty pods in the Vault simultaneously, dramatically improving durability.

To make the POD a consideration for the SMB market it is required that the entire concept of the POD be taken completely out of context.  Both its intended use case and its implementation.  What makes BackBlaze special is totally removed and only the most trivial, cursory aspects are taken and turned into something that in no way resembles the BackBlaze vision or purpose.

Digging into where the BackBlaze POD is differing in design from the standard needs of a normal business we find these problems:

  • The POD is designed to be unreliable, to rely upon a reliability and replication layer at the super-POD level that requires a large number of PODs to be deployed and for data to be redundant between them by means of custom replication or clustering.  Without this layer, the POD is completely out of context.  The super-POD level is known internally as the BackBlaze Vault.
  • The POD is designed to be deployed in an enterprise datacenter with careful vibration dampening, power conditioning and environmental systems.  It is less resilient to these issues as standard enterprise hardware.
  • The POD is designed to typically be replaced as a complete unit rather than repairing a node in situ.  This is the opposite of standard enterprise hardware with hot swap components designed to be repaired without interruption, let alone without full replacement.  We call this a disposable or ephemeral use case.
  • The POD is designed to be incredibly low cost for very slow, cold storage needs.  While this can exist in an SMB, typically it does not.
  • The POD is designed to be a single, high capacity storage node in a pool of insanely large capacity.  Few SMBs can leverage even the storage potential of a single POD let alone a pool large enough to justify the POD design.
  • The BackBlaze POD is designed to use custom erasure encoding, not RAID.  RAID is not effective at this scale even at the single POD level.
  • An individual POD is designed for 180TB of capacity and a Vault scale of 3.6PB.

Current reference of the BackBlaze POD 5:

In short, the BackBlaze POD is a brilliant underpinning to a brilliant service that meets a very specific need that is as far removed from the needs of the SMB storage market as one can reasonably be.  Respect BackBlaze for their great work, but do not try to emulate it.