Thursday, March 22, 2007

The Alaska Fiasco

http://www.techworld.com/storage/news/index.cfm?newsID=8341&pagtype=samechan

Quite interesting, when someone that's come to save your data ends up destroying it :-) albeit inadvertently.

Apparently those responsible for backing up data for a bunch of files from Jan 2006 never "checked a box" requesting it be backed to tape. When a glitch arose in an EMC (www.emc.com) array the specialist figured out that the fix was to clean the area that was corrupted. In the process some key SQL files were deleted as well, making restoring of the lost data impossible. Why? Because the data had never been backed to tape, that's why. A single, simple unchecked check-box. See what happens when you don't have formal processes and procedures to accomplish even seemingly mundane IT tasks and duties?
Now they're coming up with a formal backup plan.

Now imagine if this had been a public/private company and not the govt. Hmmm.

In any case, what did they do to fix this? Go back to good ol' paper is what. Four part-timers over 2 months scanned in the paper copies and finished the task at a cost of $200K. Not something I'd want to pay for someone's dereliction of duty, but since when did us taxpayers get a say in the affairs of our govt!

Overall, this is what I'd recommend:
1. Institution of a Backup Policy:
a. Rate data (not information - that is to come later) in tiers of importance - say 1-4 (1=critical)
b. Critical data to have incremental backup every 30 minutes or every hour depending on activitiy, with full backup every 6 hours
c. Level 2 data incremental backup every 4 hours or so, with full backup every 6 hours
d. Level 3 data incremental backup every 6-8 hours, with full back up 10-12 hours
e. Level 4 data full backup every day

2. Follow Processes:
a. Implement ITIL/COBIT - they not only guide you on implementing specific processes, but also help you isolate responsibilities and increase productivity. ITIL is the future of the IT management. Without it you're not going to be able to converse intelligently with other entities that are ITIL compliant
b. Hire only those companies that follow ITIL methodologies themselves
c. Hire an independent consultant to take a look at the mess that's the IT dept and follow any and all reasonable suggestions. Break existing philosophy and destroy any comfort levels that you may have absorbed; this isn't your data - it belongs to US!
d. Run test runs every month - WITHOUT FAIL. If you're caught napping you're out
e. Do a FULL backup and recovery test every 3 months. I can't emphasize enough the importance of being prepared. What might cost you very little now will save you hundreds of thousands or even millions of dollars later on. Don't risk it - just test it

3. Follow-up:
a. Every quarter, have a meeting with the IT guys - what is missing, what can be improved, what needs to be changed, what should be chucked. Listen to them - they are your ears and eyes, and without them you're severely restricted in what you know. And while you don't have to do everything they say, at least think about it
b. Implement benchmarks 1 year from the time you started the project. No point having benchmarks too early in the game. Nothing to compare. Mark the improvement (hope it's improvement!) - on a chart and use it to inspire non-compliant members
c. Institute performance bonuses and rewards for education (ITIL certification etc)
d. Train every employee on the importance of data and its criticality

What is somewhat disturbing is the way the data got just wiped out. I mean, come on, a "specialist" can come by and simply destroy anything he wants (of course, by accident)? Shouldn't there be safeguards against precisely these kinds of incidents? How about getting permission from a resident IT expert before purging data, or just backing it up to another disk before attempting to delete something? I know it's very hard to imagine how erasing just a few files can cause havoc, but that' s nature of databases. Indexes, journaling, logging - you got to be aware of these concepts before you touch anything related to databases.

No comments: