There is often confusion between the terms Incident Management and Problem Management, so the aim of this article is to clarify the difference between the two and highlight a common conflict which can occur.
Incident Management
An “incident” is defined as event that isn’t part of the normal running of a service which causes or might cause a degradation or interruption to that service. They can be raised from numerous sources, for example, they could be raised by an end user or in an automated fashion by the system itself.
The aim of Incident Management is to restore normal service as quickly as possible following an incident or incidents, with minimal impact to the user so that agreed Service Level Agreements (SLAs) are met.
Problem Management
A “problem” is the unknown root cause of one or more incidents. For example, if on a corporate IT network email wasn’t working for users yet all other applications were working fine, then many users would probably be raising incidents (perhaps via telephone, perhaps via the Intranet), but there would be just one problem, the as yet unknown reason email isn’t working.
Once we have identified the root cause of a problem we a “known error”. Problem management aims to minimise the impact of problems on the organization by providing solutions to the root cause of incidents, thus preventing further incidents.
Note that there can be a conflict between Incident Management and Problem Management. Consider again our email example. From an Incident team perspective the quickest way to resolve the problem might be to reboot the email server. However, from a Problem team perspective, if rebooting the email server resulted in lost log data essential to finding the root cause, then the Problem team will be slowed down in finding the root cause and fixing the problem permanently, because of the Incident’s team desire to restore normal operations as soon as possible without consideration of the Problem team.
A possible solution to this conflict is for the two teams to agree between themselves a plan of attack upon the problem occurring again, including, who to notify in the problem team, what data to collect, and for how long etc.
Summary
There is often confusion between the terms Incident Management and Problem Management, which this article clarifies. Incident Management aims to get the service back to a normal level as quickly as possible. Problem Management aims to fix the root cause and so prevent further incidents. If you’re a project or program manager then it is a good idea get your hands on any incident reports and problem reports coming out of this part of the organization, as it can only make you better at your role if you understand the major sources of issues for users and within the systems themselves.
