Category Archives: Business of IT

Rethinking Long Term Support Releases

Traditionally Long Term Support operating system releases have been the bulwark of enterprise deployments.  This is the model used by IBM, Oracle, Microsoft, Suse and Red Hat and has been the conventional thinking around operating systems since the beginning of support offerings many decades ago.

It has been common in the past for both servers and desktop operating system releases to follow this model, but in the Linux space specifically we began to see this get shaken up where less formal products were free to experiment with more rapid, unsupported or simply unstructured releases.  In the primary product space, openSuse, Fedora and Ubuntu all provided short term support offerings or rapid release offerings. Instead of release cycles measured in years and support cycles closing in on a decade they shorted release cycles to months and support to just months or a few years at most.

In the desktop space, getting new features and applications sooner, instead of focusing primarily on stability as was common on servers, often made sense and brought the added benefit that new technologies or approaches could be tested on faster release cycle products before being integrated into long term support server products.  Fedora, for example, is a proving ground for technologies that will, after proving themselves, make their way into Red Hat Enterprise Linux releases.  By using Fedora, end users get features sooner, get to learn about RHEL technologies earlier and Red Hat gets to test the products on a large scale before deploying to critical servers.

Over time the stability of short term releases has improved dramatically and increasingly these systems are seen as viable options for server systems.  These systems get newer enhancements, features and upgrades sooner which is often seen as beneficial.

A major benefit of any operating system is their support ecosystem, including the packages and libraries that are supported and provided as part of the base operating system.  With long term releases, we often see critical packages aging dramatically throughout the life of the release which can cause problems with performance, compatibility and even security in extreme cases.  This obviously forces users of long term release operating systems to choose between continuing to live with the limitations of the older components or to integrate new components themselves which often breaks the fundamental value of the long term release product.

Because the goal of a long term release is to have stability and integration testing, replacing components within the product to “work around” the limitations of an LTS means that those components are not being treated in an LTS manner and that integration testing from the vendor is no longer happening, most likely, or if it is, not to the same degree.  In effect, what happens is that this becomes a self-built short term release product but with legacy core components and less oversight.

In reality, in most respects, doing this is worse than going directly to a short term release product.  Using a short term or rapid release product allows the vendor to maintain the assumed testing and integration, just with a faster release and support cycle, so that the general value of the long term release concept is maintained and with all components of the operating system, rather than just a few, being updated.  This allows for more standardization, industry testing and shared knowledge and integration than with a partial LTS model.

Maybe the time has come to rethink the value of long term support for operating systems.  For too long, it seems, the value of this approach was simply assumed and followed, and certainly it had and has merits; but the operating system world has changed since this approach was first introduced.  The need for updates has increased while the change rates of things like kernels and libraries have slowed dramatically.  More powerful servers have moved compatibility higher up the stack and instead of software being written to an OS it is often written for a specific version of a language or run time or other abstraction layer.

Shorter release cycles means that systems get features, top to bottom, more often.  Updates between “major” releases are smaller and less impactful.  Changes from updates are more incremental, providing a more organic learning and adaptation curve.  And most importantly the need for replacing system components that are carefully tested and integrated with third party provided versions becomes, effectively, unheard of.

Stability for software vendors remains a value for long term releases and will cause there to be a need for the use of long term releases for a long time to come.  But for the system administrator, the value to this approach seems to be decreasing and, I feel personally, has found an inflection point in recent years.  It used to seem expected and normal to wait two or three years for packages to be updated, but today this feels unnecessarily cumbersome.  It seems increasingly common that higher level components are built with a requirement of newer underlying components; an expectation that operating systems will either be more current or that portions of the OS will be updated separately from the rest.

A heavy reliance on containerization technologies may reverse this trend in some ways, but in ways that always reduce the value of long term releases at the same time.  Containerization reduces the need for extensive capabilities in the base operating system making it easier and more effective to update more frequently for improved kernel, filesystem, driver and container support while leaving libraries and other dependencies in the containers allowing applications that need long term support dependencies to be met in that way and applications that can benefit from newer components to be addressed in that manner.

Of course virtualization has played a role in reducing the value of long term support models by making rapid recovery and duplication of systems trivial.  Stability that we’ve needed long term support releases to address is partially addressed by the virtualization layer; hardware abstraction improves driver stability in very important ways.  In the same vein, devops style support models also reduce the need for long term support and make server ecosystems more agile and flexible.  Trends in system administration paradigms are tending to favour more modern operating systems.

Time will tell if trends continue in the direction that they are headed.  For myself, this past year has been an eye opening one that has seen me move my own workloads from a decade of staunch support for very long term support products to rapid release ones and I must say, I am very happy with the change.

All IT is External

In IT we often talk about internal and external IT, but this perspective is always one from that of the IT department itself rather than the one from the business and I feel that this is very misleading.   Different departments within a company are generally seen and feel as if they are external to one another; often every bit as much as an external company feels.  For example, an IT department will often see management, operations or human resources as “foreign” departments at best and adversaries at worst.  It is common to feel, and possibly rightfully so, that different departments fail to even share common overarching goals.  IT tends to be acutely aware of that and expresses it often.

What we need to appreciate is that to the business management or owners, the IT department generally appears to them like an external agencies regardless of whether the people working in it are staff or actually from a service provider.  There are exceptions to this, of course, but they are rare.  IT is generally held behind a barrier of sorts and is its own entity. IT commonly displays this in how it talks to or about management.  IT often thinks of system resources or the network as “belonging to IT”, clearly not thinking in terms of IT being just part of the company.  Both sides are commonly guilty of thinking of IT as a separate entity from the company itself.

This happens, of course, for any number of reasons.  Many IT workers choose IT because they are passionate about IT specifically, not the company or market that they are working in; their loyalty is to their IT career, not the business in question and would generally switch companies to advance their IT career rather than stay to advance their internal non-IT career.  IT professionals often struggle with interpersonal skills and so have a higher than average tendency to hide away avoiding unnecessary contact with other departments.  IT tends to be busy and overworked, making socializing problematic.  IT work demands focus and availability, again making it difficult to socialize and interface with other departments.  IT is often kept isolated for security reasons and IT is often seen as the naysayer of the organization – commonly delivering bad news or hindering projects.  IT typically has extremely high turnover rates and almost no IT staff, especially in smaller businesses, is expected to be around for the long haul.  IT is often a conduit to outside vendors and is seen as connected to them or associated with them in many ways.  IT is often behind a “blame barrier” where the organization (other than IT) on one side often seeks to blame IT for business decisions creating a stronger “us and them” mentality. IT exacerbates this with attitudes towards users and decisions makers that are often distancing.  It is also extremely common for IT workers to be staffed via an agency in such a way that there are contract obligations, restrictions or payrolls differences between IT and normal staff.

This creates a rather difficult situation for discussions involving the advantages of internal IT versus external IT.  For internal IT staff it is common to believe that by having IT internally that there are many benefits to the organization due to loyalty, closeness or the ties of payroll.  But is this really the case?

To the business, internal IT is already, in most cases, external to their organization.  The fears that are often stated about external IT service provides such that they may not work in the business’ interests, may suddenly close up shop and disappear, might be overworked and not have enough available resources, may charge for work when idle, may not have the needed expertise, may see the network and resources as their own and not act in the interests of the business, may fail to document the systems or might even hold critical access hostage for some reason – are all fears that businesses have about their own IT departments exactly the same as they have them with external IT service providers.

In fact, external service providers often provide a business with more legal recourse than employees do.  For example, internal IT employees can quit with zero notice and only suffer from acting “unprofessionally” in their lack of notice or can give only two weeks notice and not even have to worry about being unprofessional.  Yet replacing internal IT staff of any caliber will easily take months, and that is just before one can be hired let alone trained, indoctrinated and brought up to useful speed.  It is not uncommon, even in the enterprise, for a job search, hiring process and internal processes for access and so forth to take up to a year from the time the decision to begin interviewing has started until someone is a useful staff member.  But an external IT service provider may be obligated to provide resources for coverage regardless of if staff comes and goes.  There are far more possibilities for mitigating the staff turnover risks that employed IT staff present to a business.

Due to these factors, it is very common for a business to perceive both internal and external IT resources as roughly equal and primarily such that both are very much outsiders to the key organization.  Of course, in an ideal world, both would be treated very much as insiders and worked with as critical partners for planning, decision making, triage and so forth.  IT is critical to business thinking and the business is critical to IT thinking; neither is really functional without the other.

This context of the organizational management view of IT can be important for understanding how the business will react to IT as well as how IT should behave with management.  And it offers an opportunity for both to work on coming together, whether IT is ultimately internal or external, to behave more like a singular organization with a unified goal.

Understanding Technical Debt

From Wikipedia: “Technical debt (also known as design debt or code debt) is “a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution”.

Technical debt can be compared to monetary debt. If technical debt is not repaid, it can accumulate ‘interest’, making it harder to implement changes later on. Unaddressed technical debt increases software entropy. Technical debt is not necessarily a bad thing, and sometimes (e.g., as a proof-of-concept) technical debt is required to move projects forward. On the other hand, some experts claim that the “technical debt” metaphor tends to minimize the impact, which results in insufficient prioritization of the necessary work to correct it.”

The concept of technical debt comes from the software engineering world, but it applies to the world of IT and business infrastructure just as much. Like software engineering, we design our systems and our networks, and taking shortcuts in our designs, which includes working with less than ideal designs, incorporating existing hardware and other bad design practices produce technical debt.  One of the more significant forms of this comes from investing in the “past” rather than in the “future” and is quite often triggered through the sunk cost fallacy (a.k.a. throwing good money after bad.)

It is easy to see this happening in businesses every day.  New plans are made for the future, but before they are implemented investments are made in making an old system design continue working, work better, expand or whatever.  This investment then either turns into a nearly immediate financial loss or, more often, becomes incentive to not invest in the future designs as quickly, as thoroughly or possible, at all.  The investment in the past can become crippling in the worst cases.

This happens in numerous ways and is generally unintentional.  Often investments are needed to keep an existing system running properly and, under normal conditions, would simply be made.  But in a situation where there is a future change that is needed or potentially planned this investment can be problematic.  Better cost analysis and triage planning can remedy this, in many cases, though.

In a non-technical example, imagine owning an older car that has served well but is due for retirement in three months.  In three months you plan to invest in a new car because the old one is no longer cost effective due to continuous maintenance needs, lower efficiency and so forth.  But before your three month plan to buy a new car comes around, the old car suffers a minor failure and now requires a significant investment to keep it running.  Putting money into the old car would be an new investment in the technical debt.  Rather than spending a large amount of money to make an old car run for a few months, moving up the time table to buy the new one is obviously drastically more financially sound.  With cars, we see this easily (in most cases.)  We save money, potentially a lot of it, by quickly buying a new car.  If we were to invest heavily in the old one, we either lose that investment in a few months or we risk changes our solid financial planning for the purchase of a new car that was already made.  Both cases are bad financially.

IT works the same way.  Spending a large sum of money to maintain an old email system six months before a planned migration to a hosted email system would likely be very foolish.  The investment is either lost nearly immediately when the old system is decommissioned or it undermines our good planning processes and leads us to not migrate as planned and do a sub-par job for our businesses because we allowed technical debt to drive our decision making rather than proper planning.

Often a poor triage operation or improper authority to triage players can be the factor that causes emergency technical debt investments rather than rapid future looking investments.  This is only one area where major improvements may address issues, but it is a major one.  This can also be mitigated, in some cases, through “what if” planning to have investment plans in place contingent on common or expected emergencies that might arise, which may be as simple as capacity expansion needs due to growth that happen before systems planning comes into play.

Another great example of common technical debt is server storage capacity expansion.  This is a scenario that I see with some frequency and demonstrates technical debt well.  It is common for a company to purchase servers that lack large internal storage capacity.  Either immediately or sometime down the road more capacity is needed.  If this happens immediately we can see that the server purchased was a form of technical debt in improper design and obviously represents a flaw in the planning and purchasing process.

But a more common example is needing to expand storage two or three years after a server has been purchased.  Common expansion choices include adding an external storage array to attach to the server or modifying the server to accept more local storage.  Both of these approaches tend to be large investments in an already old server, a server that is easily forty percent or higher through its useful lifespan.  In many cases the same or only slightly higher investment in a completely new server can result in new hardware, faster CPUs, more RAM, the storage needed, purpose designed and built, aligned and refreshed support lifespan, smaller datacenter footprint, lower power consumption, newer technologies and features, better vendor relationships and more all while retaining the original server to reuse, retire or resell.  One way spends money supporting the past, the other often can spend comparable money on the future.

Technical debt is a crippling factor for many businesses.  It increases the cost of IT, sometimes significantly, and can lead to high levels of risk through a lack of planning and most spending being emergency based.

 

No One Ever Got Fired For Buying…

It was the 1980s when I first heard this phrase in IT and it was “no one ever got fired for buying IBM.”  The idea was that IBM was so well known, trusted and reliable that it was the safe choice as a vendor for a technology decision maker to select.  As long as you chose IBM, you were not going to get in trouble, no matter how costly or effective the resulting solution turned out to be.

The statement on its own feels like a simple one.  It makes for an excellent marketing message and IBM, understandably, loved it.  But it is what is implied by the message that causes so much concern.

First, we need to understand what the role of the IT decision maker in question is.  This might sound simple, but it is surprising how easily it can be overlooked.  Once we delve into the ramifications of the statement itself, it is far too easy to lose track of the real goals. In the role of a decision maker, the IT professional is tasked with selecting the best solution for their organization based on its ability to meet organizational goals (normally profits).  This means evaluating options, shielding non-technical management from sales people and marketing, understanding the marketplace, research and careful evaluation.  These things seem obvious, until we begin to put things into practice.

What we have to then analyze is not that “no one ever got fired for choosing product X”, but what the ramifications of such a statement actually are.

First, the statement implies an organization that is going to judge IT decision making not on its merits or applicability but on the brand name recognition of the product maker.  In order for a statement like this to have any truth behind it, it requires the entire organization to either lack the ability or desire to evaluate decisions but also an organizational desire to see large, expensive brand names (the statement is always made in conjunction with extremely high cost items compared to the alternatives) over other alternatives.  An organizational preference towards expensive, harder to justify spends is a dangerous one at best.  We assume that not only does buying the most expensive, most famous products will be judged well compared to less expensive or less well known ones, but that buying products is seen as beneficial to not buying products; even though often the best IT decisions are to not buy things when no need exists.  Prioritizing spending over savings for their own reasons without consideration for the business need is very bad, indeed.

Second, now that we realize the organizational reality that this implies, that the IT decision maker is willing to seize this opportunity to leverage corporate politics as a means of avoiding taking the time and effort to make a true assessment of needs for the business but rather skip this process, possibly completely, we have a strong question of ethics.  Essentially, whether out of fear of the organization not properly evaluating the results  or by blaming the decision maker for unforeseeable events after the fact or of looking to take advantage of the situation to be paid for a job that was not done, we have a significant problem either individually, organizationally, or both.

For any IT decision maker to use this mindset, one that there is safety in a given decision regardless of suitability, there has to be a fundamental distrust of the organization.  Whether this is true of the organization or not is not known, but that the IT decision maker believes it is required for such a thought to even exist.  In many organizations it is understandable that politics trump good decision making and it is far more important to make decisions for which you cannot be blamed rather than trying honestly to do a good job.  That is sad enough on its own, but so often it is simply an opportunity to skip the very job for which the IT decision maker is hired and paid and instead of doing a difficult job that requires deep business and technical knowledge, market research, cost analysis and more – simply allowing a vendor to sell whatever they want to the business.

At best, it would seem, we have an IT decision maker with little to no faith in the ethics or capabilities of those above them in the organization.  At worst we have someone actively attempting to take advantage of a business by being paid to be a key decision maker while, instead of doing the job for which they are hired or even doing nothing at all, actively putting their weight behind a vendor that was not properly evaluated based possibly solely on not needing to do any of the work themselves.

What should worry an organization is not that vendors that could often be considered “safe” get recommended or selected, but rather why they were selected.  Vendors that fall into this category often offer many great products and solutions or they would not earn this reputation in the first place.  But likewise, after gaining such a reputation those same vendors have a strong financial incentive to take advantage of this culture and charge more while delivering less as they are not being selected, in many cases, on their merits but instead on their name, reputation or marketing prowess.

How does an organization address this effect?  There are two ways.  One is to evaluate all decisions carefully in a post mortem structure to understand what good decisions look like and not limit post mortems to obviously failed projects.  The second is to look more critical, rather than less critically, at popular product and solution decisions as these are red flags that decision making may be being skipped or undertaken with less than the appropriate rigor.  Popular companies, assumed standard approaches, solutions found commonly in advertising or commonly recommended by sales people, resellers, and vendors should be looked at with a discerning eye, moreso than less common, more politically “risky” choices.