Why We Reboot Servers

A question that comes up on a pretty regular basis is whether or not servers should be routinely rebooted, such as once per week, or if they should be allowed to run for as long as possible to achieve maximum “uptime.”  To me the answer is simple – with rare exception, regular reboots are the most appropriate choice for servers.

As with any rule, there are cases when it does not apply.  For example, some businesses running critical systems have no allotment for downtime and must be available 24/7.  Obviously systems like this cannot simply be rebooted in a routine way.  However, if a system is so critical that it can never go down then this situation should trigger a red flag that this system is a point of failure and perhaps consideration for how to handle downtime, whether planned or unplanned, should be initiated.

Another exception is some AIX systems need significant uptime, greater than a few weeks, to obtain maximum efficiency as the system is self tuning and needs time to obtain usage information and to adjust itself accordingly.  This tends to be limited to large, seldom-changing database servers and similar use scenarios that are less common than other platforms.

In IT we often worship the concept of “uptime” – how long a system can run without needing to restart.  But “uptime” is not a concept that brings value to the business and IT needs to keep the business’ needs in mind at all times rather than focusing on artificial metrics.  The business is not concerned with how long a server has managed to stay online without rebooting – they only care that the server is available and ready when needed for business processing.  These are very different concepts.

For most any normal business server, there is a window when the server needs to be available for business purposes and a window when it is not needed.  These windows may be daily, weekly or monthly but it is a rare server that is actually in use around the clock without exception.

I often hear people state that because they run operating system X rather than Y that they no longer need to reboot, but this is simply not true.  There are two main reasons to reboot on a regular basis: to verify the ability of the server to reboot successfully and to apply patches that cannot be applied without rebooting.

Applying patches is why most businesses reboot.  Almost all operating systems receive regular updates that require rebooting in order to take effect.  As most patches are released for security and stability purposes, especially those requiring a reboot, the importance of applying them is rather high.  Making a server unnecessarily vulnerable just to maintain uptime is not wise.

Testing a server’s capacity to reboot successfully is what is often overlooked.  Most servers have changes applied to them on a regular basis.  Changes might be patches, new applications, configuration changes, updates or similar.  Any change introduces risk.  Just because a server is healthy immediately after a change is applied does not mean that the server nor the applications running on it will start as expected on reboot.

If the server is never rebooted then we never know if it can reboot successfully.  Over time the number of changes having been applied since the last reboot will increase.  This is very dangerous.  What we fear is a large number of changes having been made, possibly many of them undocumented, and a reboot then failing.  At that point identifying what change is causing the system to fail could be an insurmountable process.  No single change to roll back, no known path to recoverability.  This is when panic sets in.  Of course, a box that is never rebooted intentionally is more likely to reboot unintentionally – meaning the chance of a failed reboot is both more likely to occur and more likely to occur while in active use.

While regular reboots are not intended to reduce the frequency of failed reboots, in fact they actually increase the occurrence of failures, the purpose is to make those failures easily manageable from a “known change” standpoint and, more importantly, to control when those reboots occur to ensure that they happen at a time when the server is designated as being available for maintenance and is designed to be stressed so that problems are found at a time when they can be mitigated without business impact.

I have heard many a system administrator state that they avoid weekend reboots because they do not want to be stuck working on Sundays due to servers failing to come back up after rebooting.  I have been paged many a Sunday morning from a failed reboot myself, but every time I receive that call I feel a sense of relief.  I know that we just caught an issue at a time when the business is not impacted financially.  Had that server not been restarted during off hours, it might have not been discovered to be “unbootable” until it had failed during active business hours and caused a loss of revenue.

Thanks to regular weekend reboots, we can catch pending disasters safely and, thanks to knowing that we only have one week’s worth of changes to investigate, we are routinely able to fix the problems with generally little effort and great confidence that we understand what changes had been made prior to the failure.

Regular reboots are about protecting the business from outages and downtime that can be mitigated through very simple and reliable processes.

IT in a Bubble

It is an old story in SMB IT, IT managers who get their start young, stay with a single company, work their way through the ranks and become venerable IT managers who have never worked outside of their current environment.  Just like the “good old days” when people stuck with a single company for their entire careers, this too sounds like a wonderful thing.  But IT has long rewarded “job hoppers”, those technically minded folk who move from shop to shop every few years.  The lack of direct upward mobility within single shops has encouraged this process – incremental promotions could only be found between companies, seldom within a single one.

Some people support and some people dispute the idea that there is value, or significant value, to be had by changing companies.  The idea is that by moving between environments you will glean techniques, procedures, processes and general experience that you will then bring with you to your next position – that you are a cumulative product of all of your past environments.  This concept, I believe, has some merit, moreso in technology than in other fields.

In technology fields, I believe that the value of moving between jobs, after a reasonable amount of time, is generally of much better value than is staying put.  The reason for this is relatively simple: Most small businesses lack an ecosystem of support and training for IT professionals. It is well known that IT professionals, working in small shops, lack the interaction with peers and vendors generally accepted as necessary for healthy professional development and which is common in enterprise shops.

An IT professional, after spending many years in a small shop, effectively all alone, tends to feel isolated lacking the professional interaction that most specialists enjoy.  Most small professional or artisan shops have a number of specialists who work together, share research and experience, are encouraged to work with competitors or vendors, to attend trade events, training, etc.  Few fields share the odd dispersion of IT professionals with only one or two people working together at any given company with little to no interaction with the outside world or with peers at other companies.

This isolation can lead to “IT insanity” if left unchecked.  An IT professional, working in a vacuum with little to no technical or professional feedback, will lose the ability to assess themselves against other professionals.  As often the sole provider of technology guidance and policy for potentially years or even decades, a lone IT professional can easily “drift off course” and lose contact and course correction from the larger IT field with only light guidance offered through the filtered world of vendors attempting to sell expensive products and services.

IT professionals suffering from “IT insanity” will often be found implementing bizarre, nonsensical policies that would never be tolerated in a shop with a strong peer-review mechanism, purchasing incredibly overpriced solutions for simple problems and working either completely with or completely without mainstream technologies – mostly dependent upon individual personality.  Partially this is caused by an increasing dependence on a singular, established skill set as the lack of environmental change encourages a process of continuing dependence on existing skills and procedures.

IT insanity will commonly arise in IT shops that have only a single IT professional or in shops where there is a strict hierarchy with no movement at the management ranks so that fresh ideas and experience from younger professionals do not feed up into the managers and instead established practices and “because I said so” policies are forced down the chain to the technologists actually implementing solutions.

This is not to say that all is lost.  There are steps that can be taken to avoid this scenario.  The first is to consider outsourcing IT – any shop so small as to face this dilemma should seriously consider if having full time, dedicated internal staff makes sense in their environment.  Looking for fresh blood is an option – getting IT professionals from other shops and even other industries can work wonders.  Some shops will even trade staff back and forth in extreme cases to keep from losing existing employees but seeking to “mix things up.”

Short of drastic measures such as changing employees entirely, non-IT organizations need to think seriously about the professional health of their staff and look to opportunities for peer interaction.  IT professionals need continuous professional interaction for many reasons and organizations need to actively support and promote this behavior.  Sending staff to training, seminars, peer groups, conventions, shows or even out as volunteers to non-profit and community activities where they can provide IT support in an alternative environment can do wonder for getting them out of the office and face to face with alternative viewpoints and get their hands on different technologies than they see in their day to day lives.

IT managers need opportunities to explore different solution sets and to learn what others are doing in order to best be able to offer objective, broad-based decision making value to their own organizations.

IT Managers and the Value of Decision Making

When I was new to IT I can remember people using the phrase “No one ever got fired for buying IBM.”  At the time I was young and didn’t think too much about what this phrase implies.  Recently, I heard this phrase again – except this time it was “No one ever gets fired for buying Cisco” and soon thereafter I heard it applied to virtualization and VMWare.   This time I stopped to think about what exactly I was being told.

At face value, the statement comes as little more than an observation, but the intent runs much deeper.  The statement is used as a justification for a decision that has been made and implies that the decision was made not because the product or vendor in question was the best choice but because it was the choice that was believed to have the least risk involved for the decision maker.  Not the least risk or most value for the organization – least risk to the decision maker.

This implies one of two possibilities.  The first being that the decision maker in question, presumably an IT manager, feels that due diligence and careful analysis is not recognized or rewarded by the organization. That marketing, by an IT vendor to non-IT management, has convinced management that those products and services are superior without consideration for functionality, cost, reliability or service.

The second possibility is that the IT decision maker believes that they can get away without performing the cost, risk and functionality analysis which would be deemed proper for deciding between competing options and believes that by picking a popular option, well known in the marketplace, that they will be shielded from serious inquiry into their processes and simply deliver what sounds like a plausible solution with minimal effort on their part.

As IT Managers, one of the most crucial job functions that we perform is in identifying, evaluating and recommending products and solutions to our organizations.  The fact that phrases like these are used so commonly suggests that a large percentage of IT managers and advisers are deciding to forgo the difficult and laborious process of researching products and solutions and are banking on making an easy decision that is likely to seem reasonable to management, regardless of whether or not it is a viable solution, let alone the best one for the organization.  The assumption being that a very expensive product will be chosen when potentially a less expensive or less well known option might have worked as well or better and in some extreme cases a product may be recommended using this method that does not even provide for the needs of the organization at all.

IT lives and dies by the decision making value that it brings to the organization.  We hate to admit it, but finding people who can fix desktops is not that hard and the economic value of someone who can fix anything wrong on the desktop versus simply rebuilding one is small.  If we eliminate quality decision analysis from the IT manager’s skill set, what value does he or she bring to the company?

State of Thin Clients

The IT world loves to swing back and forth between moving processing out to the user via fat clients and moving processing back to the server leaving users with thin clients.  The battle is a long running one that started with the first appearance of multiuser computer systems several decades ago and has continued to this day and will likely continue for a very long time to come.

When I began working in IT, thin clients were simple text terminals attached to a single, central server via serial connections.  Limited to very basic text input these served their purpose at the time to provide relatively low cost computing to a large number of users.  The system wasn’t pretty or glamorous, but it was quite functional.

These ancient terminals gave way to the personal computer and computing power shifted from the datacenter to the desktop allowing users to run powerful apps like Lotus 1-2-3 and WordPerfect.  Responsive graphical applications were a powerful draw for decentralized processing.  Users were enthralled with the new usability.  The text terminal went into very rapid decline.

Eventually centralized power was available in such quantities and at such a low price point that graphical applications could be run with almost as much responsiveness from the server while clients could be “thin” needing just a shim of an operating system – enough to provide remote access back to the server.  Thin computing became the darling of the industry again and the term itself arose and moving towards centralized processing again came into vogue.

Administrators love the central computing model because data and configuration remains in one place.  Backups and management are a breeze.  The idea, at least in theory, is that in doing so desktop support becomes a non-issue with all desktop clients being nothing more than commodity components that can be replaced anytime with completely interchangeable parts.  Since nothing is stored or configured on the desktop there is nothing to support there.

In the initial swings of the “thin computing pendulum” the market movement was dramatic.  When text terminal computing first became available this was practically the only model used in the real world.  The value was so dramatic that no one could really justify doing anything else.  When the PC was introduced the movement to the fat client was so ubiquitous that many younger IT professionals today have never actually seen text terminals in use even though the move to fat “PC” clients was not as all encompassing as the move to text terminals had been one pendulum swing previous.

The PC model was generally better for end users because it mimicked how they used computers at home – those that had computers at home.  It also gave them more options for customization and, for better or for worse, opportunity for them to begin installing software of their own rather than only that software preconfigured for them on the central server.

Over time there have been a lot of developments from both camps giving each more and more advantages of the other.  Central domain services such as Microsoft’s Active Directory have come along allowing central management to extend out to fat clients bringing control and management more in line with traditional thin computing models.  Likewise, companies like Citrix have worked very hard developing new technologies that allow thin clients to perform much more like robust fat clients making their use as seamless as possible for end users and even making offline use possible for laptop users.

Most shops today have adopted hybrid models.  Fat clients where they make sense and thin clients for certain categories of users and for remote workers and continuity of business scenarios.

Over the past decade we have seen a shift in the way that business applications are created and deployed.  Today almost all business applications are web-based and have no client platform dependency.  This affords IT departments of today with a potential new opportunity – to shift from a traditional thin client platform – that requires remote graphical access – to the browser as the new thin client platform.

The move to web apps has happened slowly and most businesses have a rather large legacy codebase on which they are quite dependent that cannot be easily transferred to the new web app architecture and some apps simply are not good candidates for this architecture.  But by and large the majority of new business applications are web based, written most often in Java or .NET, and these apps are prime candidates for a new thin computing model.

If our custom business apps are available via the browser then our only commonly used apps that remain holding us back are the traditional productivity apps such as our office suites that are widely used by nearly all staff today (if they have a computer at all.)  Very few desktop apps are actually pervasive except for these.  Increasingly we are seeing browser-based alternatives to the traditional office suites.  Everyone is very aware of Google Apps as a pioneer in this area with Microsoft now offering online MS Office as well.  But the popular offerings making consumer news headlines require businesses to totally rethink long term strategies involving keeping critical business data within their walls and are not likely to be highly disruptive to the enterprise for quite some time.

What does pose a threat to the status quo is other alternative software products such as ThinkFree office which is installed within the organization and used and secured internally just like any other normal business application.  This category of “traditionally installed internal web applications” will allow enterprise IT departments to begin to reconsider their end users’ platforms without having to reevaluate their entire concept of IT in general.  The biggest barriers to this today are lingering business applications and power users using specific desktop apps that cannot be encapsulated within a browser.

One of the great advantages, however, of the browser as the new thin client is how simple it is to mix browser-based apps with traditional apps.  The move is transparent and most large businesses are moving in this direction today even if there is no overarching strategy to do so.  The market momentum to develop all new apps for the web is causing this to happen naturally.

Another key advantage of a completely “web based” architectural model is the great ease with which it can be exposed for users outside of the corporate network.  Instead of using cumbersome VPN clients and company laptops employees can find any web browser, sign in to the company network and have secure business applications delivered to any browser, anywhere.

Bringing this almost unnoticed shift into sharp relief today are a handful of, of all things, consumer devices such as: Apple’s iPhone and iPad and Google’s Android and ChromeOS platforms.  What all of these devices have in common is a focus upon being primarily thin web appliances – thin clients for consumers.  With the majority of consumer computing focused upon web connectivity the need for anything else from a platform is nearly non-existent in the consumer market.  This means that within a very short period of time users who once brought home PC experience to the office as their expectation of a computing environment will soon be beginning to bring web-based thin computing as their new expectation.

When this shift happens IT departments with need to rethink their internal application delivery strategy.  The change doesn’t have to be dramatic if current development trends are used commonly and legacy systems are routinely updated.  In fact, one of the great benefits of this new model is that traditional fat clients function very well as browser platforms and will do so for a very long time to come most likely.  Companies adopting this model will likely be able to slow desktop purchasing cycles and prepare for purchasing some form of traditional thin client with embedded browser or move to a business version of the new Nettop trend that we are beginning to see emerge in the consumer space.  Some businesses may even attempt the rather dangerous path of using consumer devices but the lack of management and security features will likely keep this from being popular in all but rare instances.

I believe, though, that this swing of the pendulum will not be as dramatic as the last one just as it was not as dramatic as the swing before that.  It will be an important trend but IT departments understand more and more that no new technological shift is a silver bullet and that with each new opportunity comes new challenges.  Most IT departments will need to implement some degree of browser-based thin computing over the next few years but most will retain a majority user base of fat clients.  Hybrid environments, like we’ve seen for many years with more traditional models, will continue as before with each technology being used in target areas where they make the most sense.

The one area where thin clients continue to be challenged the most is in mobile computing where disconnected users end up being digitally marooned away from their company networks unable to continue working until network connectivity is reestablished.  This is a significant issue for power users who must travel extensively and need to be able to continue working regardless of their current connectivity.  Today this is being solved in the traditional thin client arena thanks to companies like Citrix who continue to advance the state of the art in thin application delivery.

In the browser-based arena we have had to turn to technologies like Google Gears and Adobe AIR in the past to make this possible but these had poor market penetration.  Coming down the pike, however, is the new HTML 5 Offline API which is set to redefine how the web works for users who need to go “off the grid” from time to time.  With HTML 5 incorporating offline capabilities and a richer feature set into the specification for the web itself we expect to see broad and rapid adoption from all of the leading vendors – most likely even before the draft standard is finalized.  While still quite some ways away this new standard will certainly lay the groundwork for a significant shift towards the browser as a ubiquitous, standard and robust platform.

The future of thin computing looks to be incredibly promising both in the enterprise as well as, for the first time, in the consumer arena as well.  Adoption of thin computing models will be spurred on by the current movement towards Software as a Service models and SaaS adoption will continue to be encouraged by the widespread presence of thin computing devices.  In many ways browser-based thin computing represents the technology aspect that is now maturing in the SaaS arena where SaaS itself is maturing in social acceptance rather than technical feasibility.

The Information Technology Resource for Small Business

%d bloggers like this: