Downtime explaination

Noteworthy info from the Chaos team (forum Registration code, IRC info, etc)
Post Reply
Tim
Chaotic Dreams Team
Posts: 444
Joined: Sun Aug 29, 2004 4:50 pm
Location: Proxy King Lair

Downtime explaination

Post by Tim »

Whoever commonly visits these forums may have noticed a few days of downtime...

So, here's what happened -- the short version, because I really don't feel like giving all the details, since I've just spent 4 days working on this, it's 1:50 AM and I'm tired.

On May 5th, 12:55:24 UTC, one of the hard drives failed. Fortunately, it was in RAID 1, and thus no data was lost ...
... until literally 4 minutes and 4 seconds later, when the other HDD started failing.

The initial point of failure was one of MySQL database files, which made the daemon do an emergency exit. That's when I noticed something was wrong -- the websites couldn't connect to the MySQL server, and just trying to restart the MySQL server didn't work (it failed with a read error).

Fortunately, I've been doing daily database backups (off-site), so that wasn't a disaster.
The rest of the data on the failing drive was still okay, so I could back that up before both drives were replaced (merely as a conveniency -- it was not critical).
I reinstalled the system after that, restored relevant configuration and started up the servers again. It took a lot of time.

I hope the database backup I was working off of was early enough that there wasn't any incorrect data due to some sort of unnoticed corruption; I basically just took May 4th (I do have backups for every day, but just took the latest "probably fine").

tl;dr: Both HDDs (mirrored) on the server failed a few minutes apart. I had backups.
R.Flagg
Chaotic Dreams Team
Posts: 8460
Joined: Thu May 09, 2002 2:55 pm

Re: Downtime explaination

Post by R.Flagg »

Hi Tim.

I did notice, and I did notice your continued dedication to fix 'er up.

Thank you Mr. Sir.
Post Reply