Friday, June 26, 2009

Server "uptime" bragging

I recently saw a blog post about someone showing their server having an uptime of over 400 days, and wanting other readers to reply with some of their larger uptimes. Quite a few people obliged, and the numbers were in the hundreds of days. This made me think, "Is this really a "good thing" anymore?"

Some questions that come to my mind when I see servers with long uptimes are

1.) Are patches being applied? There are a lot of security and performance updates that are released within a year. Some may not be critical, but are you being responsible and diligent in keeping your server up to date and secure?

2.) Does the server need to be up for so long because it is a single point of failure for a critical service? Hardware gets cheaper and cheaper, and many services can be loadbalanced or clustered. With the popularity of virtual machines, even more so. If this service experiences a failure, will your customers or users notice? How long will it take to restore its functionality?

3.) Do you know if the server will restart correctly in the event something causes a reboot? This could be unexpected, like a hardware or power failure; or expected, like applying kernel updates. Over a long period of time, a lot of small changes can happen that could cause startup scripts to break, but would go undetected until you have to restart. Or, your hardware just might not want to go through a restart for whatever whacky reason.

I guess what I'm saying is, having regular maintenance reboots aren't a "bad thing." Yeah, it used to look cool to have a server up for 600 days, but I don't think it's really worth it now.

1 comment:

Matt said...

I think the same sort of thing. It's conceivable that the server is running an old, stable kernel version (2.4 series?) that hasn't been affected by any vulnerabilities released, and is running on old hardware (almost a necessity, given some of the uptimes), but still..like you said, the (much) better solution is to make servers redundant and be able to take them down occasionally.

Only in very limited situations can I see ultra-high uptime as a benefit.