Incident and Problem Management

Implement workarounds, repairs, and root cause analysis (where needed), facilitated by appropriate diagnostic practices, to resolve and prevent incidents that might affect the normal running of the IT infrastructure.

Improvement Planning

Practices-Outcomes-Metrics (POM)

Representative POMs are described for Incident and Problem Management at each level of maturity.

2Basic

Practice
Define and implement basic policies, processes, and procedures for incident and problem management for the key domains of the IT infrastructure.
Outcomes
- Defined approaches and processes are emerging for the management of IT infrastructure incidents and problems.
- A team has been established to provide a single point of contact for incidents and problems.
- Roles and responsibilities have been defined and assigned.
Metrics
- # of IT infrastructure incidents.
- # of IT infrastructure problems.

3Intermediate

Practice
Implement and use standard incident and problem management policies, processes, and procedures for most of the IT infrastructure and the services that it supports.
Outcomes
- Standardized incident and problem management practices are applied to most of the IT infrastructure and supported services.
- This results in quicker and more timely resolution of incidents and problems, which reduces their impact on the business.
Metrics
- # and % of incidents resolved per day.
- # and % of problems resolved per month.
- Incident mean time to repair (MTTR).

4Advanced

Practice
Apply in-depth and, where possible, automated self-monitoring incident and problem management across all of the IT infrastructure and supported services.
Outcomes
- Repeat incidents and problems are minimized.
- Potential incidents and problems are often detected and prevented before they occur.
Metrics
- % total downtime broken down by service.
- # and frequency of SLA breaches.

5Optimized

Practice
Evaluate and research all relevant systems, processes, and practices to continuously review, improve, and optimize incident and problem management.
Outcome
Incident and problem management is continuously improved, and is typically self-healing.
Metrics
- # of SLA breaches by service.
- System uptime.
- Mean time between failures (MTBF).