What good incident management looks like

Managing a digital service is largely straight forward, until a 3rd party service causes chaos to large swathes of the internet. Here's how it impacted us.

Home · Insights · What good incident management looks like

Date posted

10 June 2021

Reading time

3 minutes

Stephen McCalden

Service Delivery Manager ·

I’ve been dreading a week like this for a long time.

Managing a digital service is largely straight forward, keeping a customer's solution doing what it’s supposed to be doing and delivering value to its end users. Incidents will happen from time to time - that’s perfectly normal - but when a 3rd party you have no control over that your service is integrating with to function properly has an issue, you are at the mercy of their service restoration plan. It’s awful being impacted by nothing within your control to correct.

That’s what has happened this week to large swathes of the internet - which you can read more about here. A content provider had a problem causing dozens and dozens of the world’s most well-known sites to fail. Many of them me and my team provide live service support for. So, when multiple outages all occur at the same time, multiple incidents get raised, and many concerned customers all come to you for answers it becomes the perfect storm you hope never happens in service management.

Thankfully in this instance, the issue was identified and corrected quickly to restore services but I’m grateful to the team for implementing good ITIL-aligned best practice incident management to:

Be alert to the problem immediately

Understand the impact

Raise the appropriate incident ticket for tracking purposes

Make a calm and composed diagnosis

Have a prompt solution design including a temporary workaround if possible

Execute safe and efficient release management of the fix

Protect the service integrity throughout

Provide clear, regular, and concise communications at all times

Days like this don’t happen often but when they do, if you fall back on robust procedures and follow the plan you will minimise the impact and restore the service as quickly and efficiently as possible.

Here at Kainos we are ISO20000 certified with all our Live Operations services following mature and robust ITIL-aligned service management procedures. We have a proud history of serving some of the most critical digital solutions to all areas of public, health, and commercial sectors for over 30 years. We know incidents happen and we know how to react to them so our customers can have peace of mind to know their solutions are in safe hands.

About the author

Stephen McCalden

Service Delivery Manager ·

Stephen is a service delivery manager working in Kainos’ Live Operations team. He manages teams of engineers who maintain and support many critical services on behalf of our customers ensuring they continuously meet the needs of the end users at all times.

Services

Impacts

Services

Products

Workday Updates

Industries

What good incident management looks like

About the author