Creating an Incident and Problem Management process
This recipe discusses creating an Incident and Problem Management process.
Getting ready
In Incident Management, we focus on restoring a service to its known mode of operation before an unplanned interruption. Problem Management requires you to focus on understanding the actual cause of the interruption with the goal of providing a permanent resolution.
The ITIL© framework books and online resources discuss best practices for Incident and Problem Management processes. You must plan to review and understand Incident and Problem Management principles as a prerequisite to creating the processes.
How to do it...
An example of the steps for creating an Incident and Problem Management process is as follows.
Incident Management
Here are the example steps specific to an Incident Management process:
- Agree and document the organization Incident Management policy.
- Document the operational process to support the Incident Management policy. This should include but not be limited to the following:
- Support hours
- Classification categories
- Escalation procedures
- Create and assign people roles to manage the process, for example, Service Desk analysts:
- Desktop support
- Infrastructure analyst
- Service Desk managers
- We typically have two channels for Incident Management:
- Service Desk team-created incidents (using the SCSM console Sample process steps from incident creation to priority allocation) are shown in the following figure:
- Automated or end user self-service created incidents (end user web portal, e-mail, or automatic system event driven). Sample process steps from incident creation to priority allocation are shown in the following figure:
- Service Desk team-created incidents (using the SCSM console Sample process steps from incident creation to priority allocation) are shown in the following figure:
- The difference between the two typical channels is how the incident is initially categorized (triage). The next step, Process Incident, involves the creation of a process flow to match how the Incident Management team manage the incident based on your policies and procedures. An example is shown in the following figure:
- Monitor and report on the performance of the Incident Management process. The aim is to improve the process, and also identify incidents that require Problem Management.
Problem Management
Here are the example steps specific to a Problem Management process:
- Agree and document the organization Problem Management policy.
- Document the operational process to support the Problem Management policy.
- Create and assign people roles to manage the process, for example:
- Problem analysts
- Problem managers
- Review the Incident Management process with the aim of identifying instances of the following type:
- Repeated issues over a defined period (for example, monthly, quarterly, or annually)
- Incidents with known workarounds (typically implies there is an opportunity for root cause investigation)
- Perform detailed investigation on incidents escalated to Problem Management using internal experts or third-party external support.
- Create a change request for problems with known permanent fixes.
How it works...
Incident Management is about getting services that people rely on back to an agreed operational state as soon as possible. An example of Incident Management is a customer who is unable to access their documents:
- On investigation, we find that the issue is with the laptop assigned to the customer.
We issue the customer with a loan laptop and confirm access to their document.
The previous steps will resolve the incident, but we still have a problem. What is wrong with the customer's laptop?
The answer to the question is Problem Management. We use Problem Management to identify the true (root) cause of the issue. We can continue with our scenario from Incident Management:
- The desktop engineering team identifies the issue as a network hardware device failure in the laptop.
- The team also identifies that this issue has been happening to a number of laptops over the last quarter.
- The team also identifies through asset management that we purchased a set of laptops from a vendor and all the issues relate to this set.
- We escalate to the vendor and get a driver fix.
- A change request is raised to proactively apply the fix to all laptops from the set.
The fix applied to all laptops in scope resolves the issue on the original laptop. We can close the problem, and also change the original status of the incident to closed. A final best practice will be to create a knowledge article about this known issue and its corresponding fix.
The previous examples illustrate how Incident Management and Problem Management work in practice.