Horse Sense #86

Changing Your View, Part 2--Backup Isn't Important, Restore Is


For the first part of this article see:

Let's look at managing information from a garbage in, garbage out point of view.  Don't take garbage in!  It seems simple, doesn't it.  But it might be harder than you think.  I used to program payroll and personnel databases for companies.  They wanted to know absolutely everything about their employees.  They felt that having a computer gave them the power to find out anything they needed to know instantly.  They were very wrong.  It takes a lot of time and effort to collect information, to validate it, to store it, and to run reports on it.  Frankly, if it didn't matter to their paycheck, people didn't care what the database said about them, so the data (not information) was wildly inaccurate.  Even compliance oriented information like sex, age, and race in the watchdog agency's database where I worked couldn't be trusted because the individual employees didn't care about it.  It had nothing to do with their day to day lives or their paychecks.  So, be very careful about what you collect and store.  The less you collect, the less you have to manage, the more likely it is to be right, and the less it will cost to maintain over time.  Even databases that you would think would have a lot of attention paid to them are often wildly off base.  Don't think so?  There are three major consumer credit reporting agencies in the US.  You would think all of their information would be the same and that none of the information would be erroneous.  From examining my own reports, I can tell you that this is a false assumption: is a way you can get free copies of your credit report.  Many times the only way bad database information like this is corrected is if someone is denied credit and then complains.  Then it might be too little, too late.  So, when you are collecting information, consider how much you really need and how you are going to manage it.  One good tip.  If it is sensitive information, like a social security number, credit card number, or personal health information see if you can do without recording it at all to avoid possible liability.

OK, so we can see conceptually how we can decrease the amount of data we need to back up and yet be able to restore what we need.  Still, we have to deal with the fact that people are storing more and more information on their computer each year.  What do we do?  Since more of our lives are dependent on what we do on our computers, the answer isn't to back up less.  In fact, we want to make sure we can restore as quickly as we can while losing little information that we might have to regenerate.

Simple tape or optical media backup no longer allows for the speed, capacity, availability, flexibility, or cost efficiencies we need for most backup jobs.  That leaves us with moving our critical information from one hard disk to another internally or somewhere out on the Internet.  You should employ rendundant hardware and software if you can.  But redundant systems don't protect you against data loss or corruption.  For that, you will need to use the four methods metioned in Horse Sense 65 (  imaging, file by file backups, off site backups, and archiving.  You will want an image of your system that you can restore quickly with all of your patches, customizations, and applications.  Images can be turned into virtual machines.  Virtual machines allow you to move functions from one machine quickly to another allowing for failover or load balancing.  You will need to store a copy of this image both on site and off site.  Using your Internet connection to store images may not be practical as it could literally take days to load an image up to a web site or download it again, depending on the speed of your connection.  If you want to store these images on an Internet site and have a slow link, you may have to ship a hard drive to the storage site containing the image.  Once you have gotten a your base images, you can use incremental imaging to cover new operating system and program patches.  Incremental images will be much smaller than the base images.  You could also use imaging to back up your data, but it is quicker and more space efficient to use file by file or brick by brick technologies (for databases) in modern backup programs to back up the information both locally and over the Internet.  Finally you may have to archive information.  This information might be information you have removed from your backups because it no longer serves an immediate purpose, like old projects.  Or, it could be e mail that you need to keep for a very long period of time so you can search it for customer information for insurance, legal, compliance, or customer service purposes.  Archiving is an integral part of your information life cycle management process.  You determine when it is appropriate to take information out of normal circulation for placement in an archive.  You then erase the original source information.  Archiving is the equivalent of putting something out back in the storage shed.  You can still get to it fairly easily, but it isn't taking up unnecessary space and forcing you to do extra work to move about your house and find what you want.

It is also appropriate to ignore and/or destroy data.  Once information becomes just useless data, like an invoice after seven years, you can simply delete it from your archives or live data file.  To get an idea of how long you should keep (financial) information, ask your CPA how long you should hold onto it or search for articles on data rentention policies (you can further qualify with terms like tax, financial, invoice).  You should regularly clean out old data.  In addition, programs and operating systems tend to "crud up" over time and hold on to old patches, logs, dump files, and other information that is completely useless after a relatively short period of time.  Be sure to remove any programs you are no longer using and all their associated configuration files.  Have a professional clean and optimize your system for you.  You would be amazed at how much space and speed you can gain.  I've seen cleaned and optimized machines perform 4 to 10 times faster.

There is a concept in engineering related to our discussion here.  It is called the signal to noise ratio.  Signal is useful information, noise isn't.  In any transmission, there is always some noise.  You will never get your house completely clean.  However, if you clean and optimize your digital house, you will get a much higher signal to noise ratio, which will, in turn, give you a high Return on Grief (tm) and keep what you have working more effectively.

©2010 Tony Stirk, Iron Horse