IVI Framework Viewer

Service Continuity/Recovery Management

B5

Ensure continuity management/disaster recovery planning for IT services to enable the organization to meet defined business objectives.

Improvement Planning

Practices-Outcomes-Metrics (POM)

Representative POMs are described for Service Continuity/Recovery Management at each level of maturity.

2Basic
  • Practice
    Restore services on a ‘first come, first served’ basis, according to a defined and documented process (for each individual service).
    Outcome
    Lost service is restored in a suboptimal manner, possibly resulting in breaches to SLAs.
    Metrics
    • % of service restorations following a defined process.
    • Mean time to restore service (MTRS).
    • # of SLA breaches per year related to service restoration.
3Intermediate
  • Practice
    Prioritize service restoration according to customer relevance, position in business cycles, and Service Level Agreements (SLAs), leveraging architectural guidance on fault tolerance and resilience.
    Outcomes
    • Key services and services covered by SLAs achieve maximum availability.
    • There is an agreed balance in restoration priority planning between cost and acceptable business risk.
    • Architectural guidance on fault tolerance and resilience helps by providing specifics on the types of failures that the IT infrastructure must withstand (e.g. RAID 6 or off-site real-time mirror copy).
    Metrics
    • Mean time to restore service (MTRS).
    • # of SLA breaches per year related to service restoration.
    • Total down time of key services or services covered by SLAs.
    • Cost of SLA infringements.
  • Practice
    Plan for continuity under both normal circumstances and following a major loss of service, and test the plan with the participation of the customer, while ensuring appropriate service continuity education, awareness, and training.
    Outcomes
    • The customer understands what to expect in normal and exceptional service loss scenarios.
    • Service continuity testing can identify any areas of weakness or opportunities for improvement in the service continuity plan.
    Metrics
    • Existence of a schedule for continuity plan testing.
    • # of opportunities for improvement identified.
4Advanced
  • Practice
    Put in place and execute an effective, tool-supported service restoration plan that is in line with Service Level Agreements (SLAs) and is prioritized based on the criticality of business processes enabled for all services.
    Outcome
    Service downtime is minimized.
    Metrics
    • Mean time to restore service (MTRS).
    • % of service restorations that are tool-supported.
    • # of SLA breaches per year related to service restoration.
    • Total down time of key services or services covered by SLAs.
    • Cost of SLA infringements.
  • Practice
    Support service continuity with periodic risk assessment, business impact analysis exercises, and service contingency/failover testing.
    Outcome
    Determining the required versus actual resilience for each service allows effective prioritization of opportunities for improvement in line with the business criticality of the services.
    Metrics
    • Frequency of risk assessments, business impact analysis exercises, and service contingency/failover tests.
    • # of improvement opportunities identified.
5Optimized
  • Practice
    Enable automated, SLA-driven restoration.
    Outcomes
    • Services are restored as they are required.
    • Repair/recovery time is near zero.
    Metric
    % of successful automated service restorations.
  • Practices
    • Review and test IT service continuity plans on a regular basis to ensure that priority and response times meet changing business needs, that responsibilities for invoking the plans are clearly assigned, and that service risks are reduced.
    • Confirm backups of data, documents, and software, and that any equipment and personnel necessary for service restoration are quickly available following a major service failure or disaster, and that staff understand their role in invoking and executing the plans.
    Outcomes
    • There is service continuity and a reduction in recovery risk in meeting current and planned business needs.
    • Service continuity risks can be identified early on, and the controls to manage them or the cost-justifiable countermeasures to mitigate them, can be put in place wherever possible.
    Metrics
    • Existence of service continuity and recovery plans.
    • Service continuity and restoration metrics in SLAs.
    • Frequency of risk assessments, business impact analysis exercises, and service contingency/failover tests.