by Andrzej Gorecki
[As published in True Blue, August/September 1992]
According to Andrzej Gorecki, disaster recovery plans don't have
to cost a small fortune, as long as managers separate their business functions
from the computer center and choose the right recovery plan.
One of the few areas causes little controversy in the world of business
is the need to talk about computer system disaster recovery. The topic
is fashionable the large consulting firms would love to prepare
a disaster recovery plan for you. Yet few organisations actually do something
about it. The reason people usually stop at the talk is simple: preparations
for disaster recovery are considered to be too costly.
Without doubt there is a need to plan for computer system emergencies
and recovery but disaster recovery arrangements are often misdirected.
There should be a distinction between preparations for business functions
recovery and for computer center recovery, since the latter is often an
unnecessary expense.
What is universally considered to be the best solution is available only
to the big boys who can afford to fully duplicate their computer systems.
The next best things are warm and cold sites, which are still expensive
but more affordable. There are organisations, which opt for such "low
cost"; solutions, but the costs are still massive.
So, many organisations simply elect to do nothing.
The truth is that preparations for disaster recovery need not be expensive.
Every business can and should prepare for a system disaster. The key is
to make disaster recovery preparations affordable. To do so one needs
to change the disaster recovery paradigm.
When one considers a total loss of a computer system, a typical reaction
is to plan for a reconstruction of the system. What can be more logical
than that? Surprisingly, for most businesses, this is a wrong reaction.
The reconstruction of the computer system means a simultaneous restoration
of all business functions. What may come, as a surprise to some people
is the fact that in most businesses only a few functions are truly time
critical. The majority of business functions can survive for days or even
weeks without a computer system. So, why recover them instantaneously
(or thereabouts) at a monstrous expense if they can wait?
Labelling Disaster
Business functions fall into one of five categories as far as disaster
recovery is concerned:
Category I Instantaneous recovery (no system failure is
acceptable). Examples include systems managing life monitoring and support
systems (hospitals) and systems controlling nuclear reactors.
Category 2 Rapid recovery (within minutes). Examples include
banking systems (needed for online customer service) and retail point of sale systems.
Category 3 Fast recovery (within a few hours). Examples
include real time warehouse management systems and air- line seat reservation
systems.
Category 4 Medium pace recovery (within a few days). Examples
include online library management systems and pay- roll systems.
Category 5 Slow pace recovery (within a few weeks). Examples
include general ledger systems and fixed assets systems.
Obviously some of the above examples can be moved a category up or down,
depending on specifics of the business. A good criteria for qualifying
business functions into the categories is the maximum time without the
computer system before there is damage beyond repair. By such damage one
needs to under- stand loss of life (or health), massive loss of property
or monetary loss in excess of three months net profits.
The cost of disaster recovery preparations (one-off costs and on- going
costs) vary according to the category. They fall between nil (Category
5) and the total system cost being duplicated or even triplicated (Category
1). But there is also the cost of the system restoration, which comes
on top of the preparation cost, irrespective of category.
Making Preparations
Thus, the fundamental issue in preparing for disaster recovery is not
how to restore the computer system quickly (ultimately this needs to happen,
preferably as per Category 5 to minimise the cost), but how to prepare
for the disaster on a function by function basis.
Possible recovery preparations, depending on the Category, are as follows:
- Category I Instantaneous recovery. Fault free
systems and physically separate hot sites. As a standard triplicate
parallel systems are considered to be sufficient for mission critical
applications.
- Category 2 Rapid recovery (within minutes).
Fault tolerant systems. Hot sites. Function-specific specialised small
computer systems can move the business function to a lower category.
But note that if the business is lost together with the system, the
rapid recovery is no longer required; for instance, a point-of-sale
system needs to be fault tolerant but does not have to be restored quickly
if the store bums down together with the computers.
- Category 3 Fast recovery (within a few hours).
Warm site. Function specific specialised small computer systems can
reduce the recovery requirements of the functions. Move the business
function to a lower category.
- Category 4 Medium pace recovery (within a few
days). Cold site. Bureau services and function-specific specialised
small computer system can move the business function to a lower category.
- Category 5 Slow pace recovery (routine system
setup) (within a few weeks). No special response. The system is simply
rebuilt as it was originally installed within a number of weeks.
Proper disaster recovery must be driven by business functions. For example,
there is no need for an instantaneous recovery of a General Ledger system
in a nuclear power plant.
The steps to prepare for disaster recovery are as follows:
- Evaluate risk factors influencing your computer installation.
- Quantify each of the factors (use likely threat frequency
statistics).
- Establish the combined probability of the total loss
of the installation.
- Identify all business functions, which are computerised.
- Determine the length of penalty intervals for each
of the functions in the case of the system being unavailable. The intervals
are no penalty (i.e. negligible losses for e.g. up to 2 days), low penalty,
medium penalty, and maximum penalty (loss of the business).
- Develop a matrix to determine which business functions
fall into the Categories from 1 to 5.
- Design and put in place recovery arrangements separately
for each of the business functions.
Every business (as a minimum) must perform the risk assessment. This
is needed to manage the risk. The risk can either be borne by the business
or it can be contracted out in the form of insurance but this only
provides for monetary loss, not the loss of data and time which is usually
translated into dollars for the purpose of risk management. The insurer
will usually refuse to provide the cover unless the assessment has been
completed.
If the business decides to manage the risk internally then it needs to
prepare for the recovery of each business function. Since business functions
usually fall into Categories 2 to 4, an effort needs to be made to reduce
the recovery requirements of the functions.
Well-designed recovery arrangements bring all business functions into
the lowest category possible. This makes it possible to move recovery
requirements of the computer installation itself into a lower Category,
equal to the highest Category amongst the business functions.
Ideally, all business functions should be reduced down to the Category
5. This can be achieved by use of bureau services, decentralised computer
systems, and function-specific specialised small computer systems. Use
of the bureau is self-explanatory. Decentralised systems make it possible
to use a computer on another site to run the most critical applications,
at the expense of those, which fall into category 5. The function-specific
small computer systems are developed on PCs and they contain a cut down
version of the system they are supposed to replace in the case of emergency.
Using the approach of reducing business functions to Category 5 and by
using specialised (fault tolerant) equipment for those business functions
which belong to Categories I and 2, one can practically eliminate the
need to develop recovery plans for the computer installation itself. In
the case of a disaster it just needs to be rebuilt in the normal course
of business.
So, do not waste your money on the recovery of your computer systems,
prepare yourself to restore your business functions instead.
Andrzej Gorecki is a Director and principal
consultant with Melbourne-based Retail Directions Group, which develops
and supplies state-of-the-art software solutions for retailers worldwide.
Copyright
(c) 1992 Andrzej Gorecki
Top
|