T
3

Had a server go down during a school board meeting last Tuesday

I work IT for a school district and we had a power flicker during a board meeting that took out a file server. I rushed to the server room and saw the UPS was beeping like crazy. Turns out the battery backup had been failing for a while but nobody put it on the replacement list. I had to reboot the server twice before it came back clean, and the board president was standing behind me asking why the projector couldn't connect. The whole thing took maybe 15 minutes but it felt like an hour. Ever since then I check every UPS in our district on a rotating schedule. Has anyone else had a simple battery backup cause a huge headache like that?
2 comments

Log in to join the discussion

Log In
2 Comments
kai_stone99
Wait, are we really blaming the battery backup here? Sounds more like the real issue was the whole chain of people who ignored the failing UPS until it turned into a problem. I get that it's annoying to have a board president breathing down your neck, but a scheduled battery replacement list that nobody follows is basically you asking for this to happen. I've seen places where the UPS itself is fine, but nobody checks the age of the batteries or runs a self test. That's not a hardware failure, that's a process failure. The fix you found (checking them on a rotating schedule) should have been standard from day one.
5
fiona_lewis37
Jumping off what you said - the rotating schedule fix is great but it shouldn't have taken a crisis to get there. Battery self tests take like 30 seconds. Run one every month and you catch the bad ones early. That's basic stuff. The real problem is nobody wanted to be the one who said "hey these batteries are old" because that meant work. Now you've got a board president mad and a whole system down. Process failure is exactly right. No excuse for skipping maintenance that keeps everything running.
5