Horse Sense #86
Let's look at managing information from a garbage in, garbage out point of
view. Don't take garbage in! It seems simple, doesn't it. But it might be
harder than you think. I used to program payroll and personnel databases
for companies. They wanted to know absolutely everything about their
employees. They felt that having a computer gave them the power to find out
anything they needed to know instantly. They were very wrong. It takes a
lot of time and effort to collect information, to validate it, to store it,
and to run reports on it. Frankly, if it didn't matter to their paycheck,
people didn't care what the database said about them, so the data (not
information) was wildly inaccurate. Even compliance oriented information
like sex, age, and race in the watchdog agency's database where I worked
couldn't be trusted because the individual employees didn't care about it.
It had nothing to do with their day to day lives or their paychecks. So, be
very careful about what you collect and store. The less you collect, the
less you have to manage, the more likely it is to be right, and the less it
will cost to maintain over time. Even databases that you would think would
have a lot of attention paid to them are often wildly off base. Don't think
so? There are three major consumer credit reporting agencies in the US.
You would think all of their information would be the same and that none of
the information would be erroneous. From examining my own reports, I can
tell you that this is a false assumption:
http://www.ftc.gov/freereports is a way you can get free copies of your
credit report. Many times the only way bad database information like this
is corrected is if someone is denied credit and then complains. Then it
might be too little, too late. So, when you are collecting information,
consider how much you really need and how you are going to manage it. One
good tip. If it is sensitive information, like a social security number,
credit card number, or personal health information see if you can do without
recording it at all to avoid possible liability.
OK, so we can see conceptually how we can decrease the amount of data we
need to back up and yet be able to restore what we need. Still, we have to
deal with the fact that people are storing more and more information on
their computer each year. What do we do? Since more of our lives are
dependent on what we do on our computers, the answer isn't to back up less.
In fact, we want to make sure we can restore as quickly as we can while
losing little information that we might have to regenerate.
Simple tape or optical media backup no longer allows for the speed,
capacity, availability, flexibility, or cost efficiencies we need for most
backup jobs. That leaves us with moving our critical information from one
hard disk to another internally or somewhere out on the Internet. You
should employ rendundant hardware and software if you can. But redundant
systems don't protect you against data loss or corruption. For that, you
will need to use the four methods metioned in Horse Sense 65 (
http://www.ih-online.com/hs65.html)
imaging, file by file backups, off site backups, and archiving. You will
want an image of your system that you can restore quickly with all of your
patches, customizations, and applications. Images can be turned into
virtual machines. Virtual machines allow you to move functions from one
machine quickly to another allowing for failover or load balancing. You
will need to store a copy of this image both on site and off site. Using
your Internet connection to store images may not be practical as it could
literally take days to load an image up to a web site or download it again,
depending on the speed of your connection. If you want to store these
images on an Internet site and have a slow link, you may have to ship a hard
drive to the storage site containing the image. Once you have gotten a your
base images, you can use incremental imaging to cover new operating system
and program patches. Incremental images will be much smaller than the base
images. You could also use imaging to back up your data, but it is quicker
and more space efficient to use file by file or brick by brick technologies
(for databases) in modern backup programs to back up the information both
locally and over the Internet. Finally you may have to archive
information. This information might be information you have removed from
your backups because it no longer serves an immediate purpose, like old
projects. Or, it could be e mail that you need to keep for a very long
period of time so you can search it for customer information for insurance,
legal, compliance, or customer service purposes. Archiving is an integral
part of your information life cycle management process. You determine when
it is appropriate to take information out of normal circulation for
placement in an archive. You then erase the original source information.
Archiving is the equivalent of putting something out back in the storage
shed. You can still get to it fairly easily, but it isn't taking up
unnecessary space and forcing you to do extra work to move about your house
and find what you want.
It is also appropriate to ignore and/or destroy data. Once information
becomes just useless data, like an invoice after seven years, you can simply
delete it from your archives or live data file. To get an idea of how long
you should keep (financial) information, ask your CPA how long you should
hold onto it or search for articles on data rentention policies (you can
further qualify with terms like tax, financial, invoice). You should
regularly clean out old data. In addition, programs and operating systems
tend to "crud up" over time and hold on to old patches, logs, dump files,
and other information that is completely useless after a relatively short
period of time. Be sure to remove any programs you are no longer using and
all their associated configuration files. Have a professional clean and
optimize your system for you. You would be amazed at how much space and
speed you can gain. I've seen cleaned and optimized machines perform 4 to
10 times faster.
There is a concept in engineering related to our discussion here. It is
called the signal to noise ratio. Signal is useful information, noise
isn't. In any transmission, there is always some noise. You will never get
your house completely clean. However, if you clean and optimize your
digital house, you will get a much higher signal to noise ratio, which will,
in turn, give you a high Return on Grief (tm) and keep what you have working
more effectively.
©2010
Tony Stirk, Iron Horse tstirk@ih-online.com