Jun
12

Five Things about Server Failure Every MSP Should Consider

Five Things about Server Failure Every MSP Should Consider

June 12
By

The best type of hardware failure is the type that never happens. But when is it really time to send your servers out to pasture? Three years? Five years? Ten? As an IT provider, you want servers to last as long as possible so you and your clients get more bang for their buck. The problem is that you can’t risk downtime—it can turn the bang into a pathetic “whiff” in no time.

We’ve gone through a few of the top answers from trusted technology folks on the Spiceworks forum to decide what we really need to think about when it comes to server lifespan. But first, we need to define a failure.

Failure

For our purpose, let’s say that a server failure is simply a malfunction, but a catastrophic server failure means the server is irreparable and out of service completely. Often, a server starts to malfunction before a catastrophic failure, so smaller failures can warn an IT admin of impending doom.

Servers can go down unexpectedly or have some small issues that simply require a reboot. Others start to have hard drive issues and start losing data. Some companies feel that when a server starts to malfunction, it’s probably approaching the end of its life and should be replaced immediately before it becomes completely unstable. Others might not have the resources to replace the server or might rather let the server limp by until it’s unusable. Sometimes this issue comes down to what sort of warranty a server has.

Warranty

The first thing to understand is that although a server has a warranty, there’s no saying how long it will actually last. Hard drives and other working parts can fail without warning and bring a server down well before its warranty expires. But you already know that—that’s why taking regular incremental backup images is important. When something goes wrong, you at least have your data intact.

As I mentioned, the best type of failure—catastrophic or otherwise—is the one that doesn’t happen, so when hardware is approaching the end of its life, it’s wise to replace it before a catastrophic failure forces you to. Warranties can protect you from needing to buy new hardware, but they can’t help you mitigate downtime or recover lost data. The truth is, if your server has past its warranty, then it might be past the point at which the manufacturer expects it to last, which means that if you’ve got any production servers that are past warranty, they might not have much life left. You can buy extended warranties that last up to five years or longer in some cases, but again, you’re only covered on the hardware end and you can’t be sure how quickly the manufacturer will provide you or a client with another server, and that means you can’t be sure whether your client will experience a little or a lot of downtime.

Upgrades

According to professionals in another Spiceworks forum thread, the most common types of failure (after user-related issues, of course) are hard disk and power-supply failures. In many cases, both hard drives and power supplies can be replaced, and since these are the two most commons types of failure, regularly replacing or upgrading these types of hardware can extend the life of the server and save money on the overall cost of replacing a whole unit. Many of the commenters on Spiceworks recommended replacing these parts when a server is halfway to the end of the manufacturer warranty. If a malfunction can be attributed to a power supply or specific drive in the array, replacing just those pieces will extend the life and can be a very cost effective compared to replacing an entire server.

Company Profile

Or course, the decision about when to replace a server depends a lot on how a company values servers and what their cost of downtime is. If your client can handle a little downtime, replacing a server that has a minor hiccup might not be worth it. If, however, they run a multi-million dollar ecommerce website or a medical practice and can’t handle downtime at all, you may want to replace a server at the first sign of trouble. Determining a client’s needs really comes first, so you’ve got to understand how much value they put into IT and their tolerance to downtime.

Server Type

Similarly, different servers will have different needs. Obviously you want your mission critical servers to be newer. Hardware like SQL servers, domain controllers, and application servers that run most of your operation should have the newest hardware because they don’t have as much downtime tolerance. Hardware like this usually lasts between three and five years but can always be repurposed for lower level services like print servers of file servers. Once they begin to malfunction, it’s time to start thinking about whether or not you’ll replace them or if you need to use them for something different.

Hardware Strain

Knowing the type of server you’re working with is important and helps to inform you of what type of strain you’re putting on the hardware. Certain companies with certain processes might just inherently put more strain on a server, so look at exactly how much work the server is actually doing. Is the server used for virtualization? If it is, there’s a lot more consistent strain on hardware resources, and the unit will therefore wear out more quickly. Placing constant stress on a server can bring it’s lifespan from around five or six years to around three or four so keep in mind how busy the server actually is so you have an idea of when it will need to be replaced.

Consensus

Most of the experts agree that critical production servers should last about five years and should be replaced or at least moved away from critical applications or off production all together. As mentioned, many opt to upgrade drives or power supplies halfway through the five years make them last longer. Once the warranty is up, there’s no saying how long it will last and at that point, you’ve got to pay to replace anything that fails anyway, but there’s no need to let downtime be another cost.

Also, remember that just because the server you used in production had a small failure or malfunction doesn’t mean its life is completely over. Some servers can still be used in development and lab environments or even outfitted to handle router or firewall tasks, though hard disk drives and power sources may need to be replaced to keep it functioning. Many IT professionals will use servers until smoke starts coming out the top. If you go that route, make sure it isn’t running anything critical at that point.

Photo Credit: jurvetson via Compfight cc