2013-06-06

Ram Mohan C posted a blog post

Disaster Recovery and Business Continuity Management

Today enterprises live in a world where natural or man made disasters can crumble a business to its knees. It is therefore critically important for these enterprises to recognise the fact that disasters are real and happen and it is essential they have a structured programme to protect the information from external and internal threats and disasters. Common Disasters: NaturalHumanTechnicalProximityFloodsTerrorismVirus AttackNuclear ReactorsHurricaneWarPower FailureRailway TracksEarth QuakeVandalism/RiotsHVAC FailureAirportsWild FiresBurglaryNetwork FailureElectrical StationsEpidemicsData TheftBuilding ProblemsMilitary BasesTsunamiFraudHardware Failure  Accidents   These are the potential threats to an organisation and if realised may impact business operations, reputation and brand image. As you see, the threats are both internal and external. A holistic management process that identifies these potential threats and provides a framework for building organisational resilience with capability for an effective response to safeguard the interests of its key stake holders, reputation, business operations and brand image is called Business Continuity Management. Generally, most enterprises need to be back on business with minimum downtime after a disaster.There is no “one size suits all” generic BCM and disaster recovery plan. Each enterprise needs to have their own customised plan to bring them back to business. Nevertheless, there are useful guidelines available to manage the disaster and The British Standards Institution (BSI) has released a new independent standard for BCP — BS 25999-1. Prior to the introduction of BS 25999, BCP professionals relied on BSI information security standard BS 7799, which only peripherally addressed BCP to improve an organisation's information security compliance. BS 25999's applicability however extends to organisations of all types, sizes and missions whether governmental or private, profit or non-profit, large or small, or industry sector. Using these guidelines, each enterprise then needs to develop their own customised BCP.A well defined BCM has the following essential components: VisionBCM StrategyOrganisation wide awarenessIdentification of Information assetsRisk assessmentImpact AnalysisRisk mitigationBusiness Continuity PlanningDR site strategy and implementationDR drillsAudit and continuous improvement  Vision:The structured programme to secure an organisation’s business operations starts with a clearly articulated vision.  At Mindtree, we believe that this vision should come from none other than the CEO and that the initiatives should be driven from the top. The vision need to be then adapted to all the departments. When a disaster strikes, it may not spare any department. It is also critical to articulate this vision to be board and incorporate as a part of corporate governance. BCM Strategy:The next stage is to define a well articulated strategy for recovery from disaster, the essential functions that need to be recovered, time lines for recovery. The strategy should clearly focus on recovery of business operations, brand image, and reputationThe strategy typically should be in lines mentioned below: A BCP budget should be formalised and approved by senior management.Disaster declaration authorities, who will be responsible for implementing the continuity strategies in the event of a disaster or business interruption, should be identified.Incident management system or process for monitoring, recovering and stabilising from a disaster or business interruption should be identified.The plan should be reviewed periodically and benchmarked against industry standard practices and other similar organisations’ best practices. Organisation wide awareness:One of the main challenges of BCM is lack of interest. BCM is always treated as an initiative of either IS or Security Department. It is important to create awareness among the employees, partners and vendors of the organisation on the BCM initiatives and their role and responsibilities for this initiative. The training plan should be developed and the training should be conducted on regular and defined intervals.Identification of information assets:The information resides everywhere in an organisation, in printed sheets, in files, in computers, in storage racks, in offsite data centers, in tapes stored in a remote location and, even in employees’ heads. All these sources of information are vulnerable to external and internal threats. The damages can be significant. These information assets need to be identified along with their location. Once the assets are located and identified, the criticality of these assets need to be documented.Risk Assessment:Two important characteristics of risks are: Probability of occurrence of risk (low, medium and High)Severity of the risk (low, medium and high) Develop a risk table by List all the risksCategorise the risksAnalyse the probabilityAnalyse the severitySort the risks and identify the risks to be managed Impact analysis:Risk analysis need to be undertaken to cover the impact of the risk.For example:An earthquake of Ritcher scale 8.0 is low probability in London, but high impact to your information assets. On the other hand a virus attack can be high probability but low impact if all the secure measures are taken to prevent a virus attackThis impact analysis should also cover the financial / brand and other damages should  be clearly quantified.Identify key business processes and critical dependencies. The impacts of potential business interruptions should be identified.Risk Mitigation:Once the impacts are analysed, MindTree recommends a mitigation strategy need to be developed for each category of risk. The next step is to take measures to manage the risk.Risk mitigation involves:Analysis of threats most likely to occurIdentifying threats makes most impactMinimising service disruptions and financial lossHaving a contingency plan for mitigating risksFor example, the risk mitigation strategy for hardware failure of a mission critical server is to have spares onsite so that the down time is minimisedBusiness Continuity Plan:The business continuity plan should have the optimum business recovery time for your business. For example, if it is acceptable for your business recovery time to be measured in days then you may opt for just offsite tape storage.  However, if the acceptable business recovery time is just a few hours, then a hot standby system at a disaster recovery site may be needed.BCP need to cover the following aspects: Identify process specific Recovery Time Objective (RTO)Identify minimum capacity requirement to run the business operations at acceptable levelCalculate recovery efforts based on RTOReview Service Level Agreements between the organisation and external partnersIdentify critical information resourcesPrioritise these resources in order of recoveryIdentify procedure for acquiring critical resources in the event of disasterIdentify contact information and procedures for disaster authoritiesIdentify and keep ready a disaster recovery siteConduct a cost benefit analysis of moving the business processes to DR siteDefine standard procedures for response, recovery and restorationDevelop procedures for relocating the business processes to DR siteDefine emergency response procedures that areTime basedTeam BasedChecklist basedChronologicalIdentify ER team members with contact informationCreate response, recovery and restoration processes for security and safetyDocument and train crisis communication procedures DR site strategy and implementation:If the primary site of business has a major impact due to a disaster, the business processes may have to be located to an alternate site. The business processes may include people, machinery, and IT assets. The location of the DR site has to be carefully selected such that the same disaster should not affect the DR site at the same time when an event of disaster strike at the primary site.Eg: If the probability of forest fire spreading in the entire location is very high, then the disaster site should be located several hundreds of miles away from the primary site.It is also important to identify minimum capacity operations to be duplicated at the disaster recovery site to enable acceptable level of business continues until the primary site becomes functional again.Disaster Recovery Drills:Disaster recovery drills need to be drawn and tested at regular intervals in order to ensure your preparedness for a disaster.BCP and DR should cover all aspects of business from sales to operations and from people functions to IT…. specifically information management. Testing approaches like top down drill and full plan tests should be conducted.The drills often take care of only certain aspects of the business and our view is that it is likely to be worthwhile to create disaster simulation models to test the DR drills in areas where an actual drill cannot be taken care of.The drill should involve all critical business units, departments and functions. The roles and responsibilities for BCP testing should be assigned in advance.Audit and continuous improvementA post test review and analysis process need to be created.The BCM process needs to be periodically audited to ensure compliance with company standards.Specific time lines need to be defined to update the BCM based on the change management process of the organisation.Though BCM is absolute necessity for every enterprise, implementation often is faced with several challenges. Some of them are: BCM doesn’t have ROIBCM does not generate revenueCan BCM be replaced by insurance?Planners’ overkill budgetLack of interest from senior managementNo budget for BCMIt will not happen to us It is important to make sure that the BCM is lean and mean and only need minimum capacity requirement to run the business operations at acceptable level in the event of disaster. It is also important to quantify the impact to the business, brand and image of the company in the event of a disaster.This was brought home to one of the MindTree’s customers very recently when Hurricane Ike struck their Houston Data Center.  Yet, with the help of a well planned and articulated BCP and BCM plan, MindTree IMTS engineers were able to ensure recovery from the disaster within 48 hours without disrupting the client’s business. This was only possible because of several months of planning and implementation of BCP and DR. The critical business operations were moved to a disaster site without any physical movement of people, hardware or software within 24 hours. By Ram Mohan, Executive Vice President and Head of Infrastructure Management and Tech Support at Mindtree Ltd.See More

Show more