AWS IAM Management automation – Lambda to the rescue!
At Kainos, DevOps and Automation are at the core of everything we do. When I first encountered the word ‘DevOps’, it was circa 2011, and really just a new word for Systems Administration. Ten years on, and the landscape of DevOps tooling is so radically different, with application architectures being Cloud-first or Cloud-Native, that it’s vitally important that the management tools for a solution are similarly cloud-native.
Arguably one of the most tedious and time-consuming tasks of managing AWS environments is IAM user management. Specifically, on ‘legacy’ accounts, where it’s impractical or impossible to utilise AWS SSO. For example, a recent project I worked on, had four accounts – three for a variety of non-production workloads, and one for production.
Within the three non-prod accounts, specifically dev and stage, the majority of IAM user accounts were for software engineers, and within the growth phase of the project, the rate of user onboarding was extremely rapid.
The customer’s login security requirements stated the following:
- All human user accounts must use MFA
- All access keys must be less than 90 days old
- Human user accounts without MFA must be deleted
Initially, a monthly security review was carried out, which involved a manual process checking each account against the three requirements. This rapidly became unmanageable as the user count grew to well over fifty.
To streamline this process, I wrote a set of Lambda functions in Python which enforced each policy requirement.
To remove the requirement and management overhead of having to maintain a separate datastore for keeping track of notification state, the Lambda updates the Tags on the IAM User. This pattern of using Tags is common throughout all three lambdas. All three also use SES to send emails to users reporting their account issues.
The lambdas are scheduled to run daily with Amazon Eventbridge schedules:
"rate(1 day)"
- AccessKeyRevoker
Deletes users’ Access Keys if the age of said Access Key is over 90 days. Users are emailed to warn them 7, 14 and 21 days before deletion (so when the key is 69, 76 and 83 days old) – before the key is disabled and deleted on the 90th day.
- MFAEnforcer
Sends nagging emails to users to remind them to configure MFA if their user lacks a configured MFA device, specifically if the List user.mfa_devices.all() is empty. Users receive one email a day for every day their account doesn’t have MFA, and a configurable Grace Period of 3 days before their account is ultimately tagged for deletion if they still haven’t enabled MFA.
- AccountDeleter
Reads the tags on User objects, looking for Tags created by MFAEnforcer, and deletes if required. It also does a pre-deletion check to see if they have set up MFA since their final warning, and if they have, it strips the tags off their User object, so they stop getting notifications about policy violations (as they’re not in breach anymore).
The deployment of each lambda was via Terraform, deploying a Eventbridge Schedule for whichever timing frequency was required for the Lambda to execute on.
Then the lambda’s execution is ‘tied’ to the schedule as follows:
And permission granted for events to Invoke the lambda function:
The lambda itself is stored within the IaC codebase to simplify deployment from Terraform as follows:
The environment variable “DEBUG” controls whether the lambda actually *sends* emails, or just prints the content to stdout as a kind of dry-run, without making any changes.
Conclusion
These lambdas for managing IAM policy dramatically reduced the amount of manual intervention required to maintain a secure MFA environment. Automated regular ‘nagging’ emails helped to remind users of the importance of configuring MFA and rotating their access keys.
The end-result of these changes, and enforcement of policy by automatic, scheduled means was that within two days, the state of IAM across the three accounts went from having approximately 40% MFA coverage, to 100%. About a half-dozen accounts were deleted, as they’d never been used – much less MFA configured – and about 30 Access Keys were disabled and deleted as they were over 90 days old.
Overall, the solution was relatively straightforward and quick to implement, yet had an immediate positive impact to the security posture – not to mention saving a significant number of human-hours of monthly manual reviews!