Service Continuity/Recovery Management
Ensure continuity management/disaster recovery planning for IT services to enable the organization to meet defined business objectives.
Improvement Planning
Practices-Outcomes-Metrics (POM)
Representative POMs are described for Service Continuity/Recovery Management at each level of maturity.
- 2Basic
- Practice
- Restore services on a ‘first come, first served’ basis, according to a defined and documented process (for each individual service).
- Outcome
- Lost service is restored in a suboptimal manner, possibly resulting in breaches to SLAs.
- Metrics
- % of service restorations following a defined process.
- Mean time to restore service (MTRS).
- # of SLA breaches per year related to service restoration.
- 3Intermediate
- Practice
- Prioritize service restoration according to customer relevance, position in business cycles, and Service Level Agreements (SLAs), leveraging architectural guidance on fault tolerance and resilience.
- Outcomes
- Key services and services covered by SLAs achieve maximum availability.
- There is an agreed balance in restoration priority planning between cost and acceptable business risk.
- Architectural guidance on fault tolerance and resilience helps by providing specifics on the types of failures that the IT infrastructure must withstand (e.g. RAID 6 or off-site real-time mirror copy).
- Metrics
- Mean time to restore service (MTRS).
- # of SLA breaches per year related to service restoration.
- Total down time of key services or services covered by SLAs.
- Cost of SLA infringements.
- Practice
- Plan for continuity under both normal circumstances and following a major loss of service, and test the plan with the participation of the customer, while ensuring appropriate service continuity education, awareness, and training.
- Outcomes
- The customer understands what to expect in normal and exceptional service loss scenarios.
- Service continuity testing can identify any areas of weakness or opportunities for improvement in the service continuity plan.
- Metrics
- Existence of a schedule for continuity plan testing.
- # of opportunities for improvement identified.
- 4Advanced
- Practice
- Put in place and execute an effective, tool-supported service restoration plan that is in line with Service Level Agreements (SLAs) and is prioritized based on the criticality of business processes enabled for all services.
- Outcome
- Service downtime is minimized.
- Metrics
- Mean time to restore service (MTRS).
- % of service restorations that are tool-supported.
- # of SLA breaches per year related to service restoration.
- Total down time of key services or services covered by SLAs.
- Cost of SLA infringements.
- Practice
- Support service continuity with periodic risk assessment, business impact analysis exercises, and service contingency/failover testing.
- Outcome
- Determining the required versus actual resilience for each service allows effective prioritization of opportunities for improvement in line with the business criticality of the services.
- Metrics
- Frequency of risk assessments, business impact analysis exercises, and service contingency/failover tests.
- # of improvement opportunities identified.
- 5Optimized
- Practice
- Enable automated, SLA-driven restoration.
- Outcomes
- Services are restored as they are required.
- Repair/recovery time is near zero.
- Metric
- % of successful automated service restorations.
- Practices
- Review and test IT service continuity plans on a regular basis to ensure that priority and response times meet changing business needs, that responsibilities for invoking the plans are clearly assigned, and that service risks are reduced.
- Confirm backups of data, documents, and software, and that any equipment and personnel necessary for service restoration are quickly available following a major service failure or disaster, and that staff understand their role in invoking and executing the plans.
- Outcomes
- There is service continuity and a reduction in recovery risk in meeting current and planned business needs.
- Service continuity risks can be identified early on, and the controls to manage them or the cost-justifiable countermeasures to mitigate them, can be put in place wherever possible.
- Metrics
- Existence of service continuity and recovery plans.
- Service continuity and restoration metrics in SLAs.
- Frequency of risk assessments, business impact analysis exercises, and service contingency/failover tests.