I mentioned in a post earlier this morning that I was having problems accessing wordpress.com blogs – wordpress.com is a hosted multi-user version of the blog software I use, WordPress. The site is now available again but suffered a “major disk failure” according to a message on the wordpress.com Dashboard.
The data loss is presumably because the drive which failed was not in a RAID array and the last backup of the site was a couple of days ago!
This is unforgivable. No matter how small a hosting organisation you are (and WordPress.com couldn’t be considered small), your users data is sacrosanct. Users will tolerate occasional downtime but not loss of data.
Matt and the rest of the WordPress.com team, you need to try to resurrect as much of your users data as possible (if you haven’t already done this), put the site on a RAID array, put a disaster recovery plan in place which ensures no data can ever be lost again and then try very hard to rebuild your now shattered reputation.
MacManX alerted me, in the comments of this post, to the fact that Matt has put up a post about this issue. In the post, Matt explains what happened, how the WordPress.com team responded and that fact that no data was lost:
Donncha was on the ball and switched all the traffic to a recent backup so most things would work while we investigated the hardware failure. This means that an old version of your site was shown for a few hours.
A few minutes ago we restored the up-to-date database and weâ€™re currently syncing it to the backup to get back any posts you might have made during the semi-downtime. Even though we were able to recover everything, weâ€™re looking at ways to make things even more redundant, so if this ever happens again the problems will be measure in seconds or minutes
It is lucky for the WordPress.com team that no data was lost, this will help people’s confidence in the platform. However, they need to get a RAID solution in place for the database (preferably with multiple RAID containers – 1 for OS, 1 for db and 1 for transaction logs) and a live backup db server in case of a logic board failure on the db server. Only at this level of redundancy will they be able to sleep at night and hand on heart be able to promise data integrity to WordPress.com users.