Unfortunately, the explanation wasn't very, uh, satisfying.
The server ran out of memory, so it slowed to nearly a stop but didn't crash. So the two automated methods the admins use to monitor for crashes (scheduled pings & http requests) didn't sound any alarms. They weren't able to determine what caused it to suddenly eat up RAM.
If it had crashed, they would have known about it right away. Unfortunately, the admin didn't see the email I sent him that night.
The good: they're going to revise the monitoring approach. Also, this was the first blip in service we've had in over 600 days.
Still, that was a long outage.
|