Moving a Large Transactional Service to the AWS Platform in just 10 Weeks

Date posted
16 December 2015
Reading time
11 Minutes
Peter Campbell

Moving a Large Transactional Service to the AWS Platform in just 10 Weeks

Choosing and building out a production environment for a large transactional service can be difficult and time-consuming. The Driver and Vehicle Standards Agency have recently moved to the cloud - in this case using Amazon Web Services AWS - to re-engineer the new digital MOT testing service. The team built a scaled-out highly available platform, extensively tested it and transitioned to it in just ten weeks before switching off the old mainframe. The new MOT testing service has handled over 10,000,000 mot tests from across the country since it was switched on 13 weeks ago. It has successfully coped with the seasonably high demand from garages in September. The staged rollout completed over the summer allowed the increasing demand to be validated carefully on the new platform and issues addressed on a daily basis using the automated deployment pipeline. It would be reasonable to assume this new production service in AWS was a long-running project given that many services take 6 to 9 months to procure, design, build and deliver a production environment. Wrong. By reusing code from other government services and leveraging the best of agile development the expert team delivered a new production service for MOT testing in 10 weeks. The production service was battle-tested throughout this period with continual performance testing. The herculean effort from the Kainos team ensured that DVSA deadlines were met to ensure the mainframe could be switched off in confidence knowing that the MOT system had transitioned safely to a digital service. But what about information security? But surely government services aren't allowed to be deployed in Amazon Web Services? Wrong. The decision on risk for departmental data rests with the Senior Information Risk Owner typically delegated to the departmental accreditor. Given the 2014 Security Classifications policy change, the MOT dataset was classified as Official data (as most government data will be). This allows cloud providers like Amazon to be considered in light of their good commercial practices not by their adherence with Business Impact Level controls. These commercial practices can be examined under Non-Disclosure Agreement. Most of the risk assessment was focused on the tenant built for the MOT testing service on the AWS platform not the platform. The team worked daily with the DVSA Information Assurance team and the Department for Transport accreditor to examine controls, collect documentation and validate the emerging service. The team reached out to CESG to clarify policy and were provided with a VPN implementation to further accelerate the security implementation. This collaborative process across teams culminated in an Independent IT Health Check that evidenced the strong implementation with no high vulnerabilities in the new service. Many services opt to use synthetic data during development and testing because of the difficulty getting authorised access to real data. For the MOT testing service the team needed access to a production-sized dataset for realistic performance testing and also to validate the 1TB database migration. To achieve this it gained permission to use production data for testing purposes in AWS once the required controls were in place. This unlocked the potential to see the benefits of agile delivery. Agile Infrastructure The progress made in 10 weeks would not have been possible without the scrum team using an agile delivery process. The team began with a new production environment based on a lightweight design statement, iterating it throughout. Within the first 3 weeks we had a functioning MOT service in AWS. This was iterated daily to add more features. The following features were iterated,
  • Monitoring and alerting to provide transparency about what the service was doing.
  • Scaling out and up to support peak demand with continuous performance testing to prove the changes.
  • Optimisation of the performance bottlenecks in the implementation discovered by analysis of testing. AWS Support and Architects teams helped us throughout to diagnose and resolve platform questions and issues.
  • Security controls for networking, virtual machines and integration.
  • Integration securely to dependent cloud services outside of AWS using a combination of SSL and VPN.
The team preferred a kanban approach where work was prioritised daily; this reduced the lag between decision about what to work on and implementation. Throughout these iterations the team were very aware that the goal was to build a new performance-proven production service, not just a new production service. Because of this performance testing began right at the beginning of the build and run continually throughout. This allowed the platform and application to be performance optimised based on simulated peak load. It was this agile benefit of fast delivery and continual testing of the service throughout that increased senior DVSA stakeholder confidence that the service would be delivered successfully for garages. Fast Fixing The MOT testing service was not exempt from early teething issues. These were perhaps inevitable given the short duration for the transition. However it is the innovation of the deployment pipeline for the whole service (infrastructure and application) that allowed the team to diagnose the issues, resolve, test and deploy very quickly. The pipeline provides an automated process for testing and deploying changes that permits daily changes to the MOT testing service in the evenings. Issues were resolved and deployed to production within 24 hours. This process continues today to facilitate continuous delivery for the MOT service. It is a vital mechanism to join development teams to MOT garage users, reducing the time taken to publish new features - whatever type they are - to production safely. Non-functional testing is fully integrated into this deployment pipeline to avoid regression issues such as poor performance for new features automatically progressing to production. What's next? The MOT testing service is an important 24/7 service for UK government. But it is not the only digital service provided by DVSA. Other services will follow the MOT pattern of elastic cloud infrastructure with DevOps culture at its heart: embedding ops engineers alongside developers and testers within teams and continuous delivery of the whole service. The new Check MOT history service is one example of this and has just been launched to allow citizens to view the test history of a vehicle.   Watch our latest videos with DVSA  

About the author

Peter Campbell