Disaster recovery: If my region is down, can I still work in another region?

Implementing a disaster recovery scenario, not just a failover as a test, but running out of a different region for a week

Home · Insights · Disaster recovery: If my region is down, can I still work in another region?

Date posted

19 October 2021

Reading time

5 minutes

Austin Lonergan

Senior Ops engineer ·

The benefit of Azure PaaS and Azure DevOps is that you can plan and implement your disaster recovery easily and effectively. We recently had a scenario with a customer that showcased this perfectly!

The scenario

Our customer approached us and asked if we could implement a disaster recovery scenario for them not just a failover as a test, but instead, running out of a different region for a week? The answer is yes, we can.

Azure PaaS setup:

Traffic manager directing the traffic between regions
App Service environment
App Service plans x 2
5 WebApps with 5 APIs
1 function sending data to a third party source
Managed SQL within a failover group

Azure DevOps setup:

All code written in ARM and updated via code instead of manually on Azure portal
Single ARM file for each app service with parameter file for Prod West/South
Pipeline for resource creation in UK South

The failover

The customer requested a controlled disaster recovery, which also meant they wanted downtime to be limited. The day was a Friday at 4:30 pm, so a few days earlier we began deploying the infrastructure to UK South.

We started with deploying the vNET and network security groups
Following was the ASE. For this, we used the same static IP from UK West ASE. This meant that the 1 web app behind a VPN would not need to be updated with a new IP address
Next, we began deploying the application gateway
Once ASE was deployed, the 2 app service plans began
Then the WebApps, API and function
Lastly, the latest developer code in main was deployed to the app services

At this stage, we had deployed the full infrastructure to our second region in 2 hours 13 minutes.

Failover commences - Friday, 4.30 pm

The green light was given from the customer to commence failover:

We stopped the app services in UK West to give an application down message
At traffic manager, we disabled UK West endpoint and enabled UK South
This routed customer traffic to the secondary zone
We then went to the database and initiated database failover to the secondary zone
With the benefits of a failover group, the infrastructure didn’t need to be updated with the secondary SQL server name

This process took 15mins from stopping the apps to restoring customer access to the site and allowing business to continue.

Over the following week, the customer was successfully running out of the UK South site with business continuing as normal. However, the purpose of this exercise was to test if it could be done in the event of a real request. As smooth as it looked on paper and in practice, we had 2 errors that have been fixed now but doing this exercise helped highlight them. One error, which was linked to the application still using local SQL accounts has prompted us to move to managed identity and improve the system further.

The following Friday we shut down the apps in UK South, the traffic manager was failed back, SQL was failed back, and UK West apps started. Within 15 mins the customer was back on production websites and business continued.

This was a successful disaster recovery exercise completed by the team which gave us and more importantly, the customer, confidence in the system built by Kainos. The next test is already scheduled in 6 months.

If you would like to learn more about LiveOps and our cloud and engineering services click here.

About the author

Austin Lonergan

Senior Ops engineer ·

Senior Ops Engineer working on migration projects for LiveOps.

Services

Impacts

Services

Products

EMEA Rising 2025

Industries

Disaster recovery: If my region is down, can I still work in another region?

The scenario

The failover

Failover commences - Friday, 4.30 pm

About the author