Improving security through a TechOps review

Ash and Jenny sat looking at a laptop

In the last 12 months or so, we’ve seen a number of teams adopt our approach, and start to leverage and in some cases, even contribute back to, our Terraform code

I’ve been working in the Regional Service Division (RSD) of the Department for Education (DfE) for over 2 years now. Nurturing their technical operations, managing their Azure infrastructure, and supporting DevOps throughout the programme we’re partnering on.

During this time, I have been fine-tuning our Azure Container Apps Terraform module, delivering a secure out-of-the-box hosting platform for people across DfE to use. In the last 12 months or so, we’ve seen a number of teams adopt our approach, and start to leverage and in some cases, even contribute back to, our Terraform code. This is great to see! It aligns with government principles of open-source coding and reusability, and demonstrates dxw’s proficiency in infrastructure as code.

We’re proud to have passed the recent IT Health Checks with flying colours, (shout-out to the Developers and Designers who have a massive contribution here too) and needless to say, word has gotten around about our secure cloud ops expertise. Last week I was asked to complete a technical review of a neighbouring DfE programme. They are one of the teams using our Terraform module to launch their Container Apps, but found without a dedicated Ops person on their team, they were struggling to make head or tail of the whole thing.

My job was to review their implementation of our Terraform module, their CI/CD Pipelines, and their general Azure infrastructure estate, and make recommendations based on the established model that we have in the Regional Service Division programme.

GitHub release pipelines

I started by reviewing their GitHub release pipelines to see what I would improve on.

When modifying resources in Azure, you typically authenticate using a service account via the Azure CLI. There are many mechanisms to authenticate with Azure, the most common and often the simplest is with a Client ID and Client Secret (think: username/password). It works, but let’s be honest, it could be safer.

Turns out, Azure supports Federated Credentials with OpenID Connect (OIDC), and I was pleasantly surprised to learn that GitHub’s got a guide on setting up OIDC with Azure. Switching to OIDC means ditching stored secrets entirely and getting credentials only when you need them aka “just-in-time”, further hardening the security of the deployment workflows.

Developers are often aware that they need to have some kind of quality gate for their code coverage and test reports, but one thing I find that is overlooked is scanning the resulting Docker image for security vulnerabilities. Your application code is only as secure as the environment it runs in after all. My recommendation here was to set up a new GitHub Workflow that builds and scans the Docker image using Trivy. We’ve been using this in RSD and it can integrate with GitHub Advanced Security so that any CVEs or issues that are reported by Trivy, get bubbled up into the GitHub Security tab on your repository.

Terraform configuration

Whilst reviewing the Terraform there wasn’t anything spectacular to report on. The team were correctly implementing the Container App module we had published, albeit a minor version behind (no big deal).

My main concern with Terraform was that the configuration they had defined on their Azure Front Door (CDN) to set HTTP Response Headers, was using an outdated configuration. In my experience, HTTP Response Headers are the number 1 most common criteria item raised during IT Health Checks.

According to the OWASP HTTP Headers cheat sheet and the GOV.UK Service Manual on HSTS configuration the recommended HTTP Headers should be:

Azure cloud security posture

As the team were using our Terraform module, a lot of their Azure architecture was in a really good state, but it could be more robust.

In Azure, you can set up Resource Locks, one of which is a ‘CanNotDelete’ Lock. The clue is in the name, having this lock enabled for a particular resource prevents you from deleting it accidentally. It’s a nice little feature that has a real world use-case in production environments. I recommended that they add these locks on their production SQL Server, and their DNS Zone, arguably the 2 most important stateful resources they had in their tech stack.

Azure SQL Servers integrate nicely with Microsoft Entra (Formerly known as Azure Active Directory), and offers you the ability to assign a SQL Server Administrator to an Entra User/Group. This is very useful especially when managing multiple SQL Servers. However, the team had, probably inadvertently, designated their infrastructure deployment service account as the SQL administrator. Remember earlier in the post where I mentioned they were using a username and password approach? This meant a severe risk potential for lateral movement and privilege escalation.

Consider the scenario where an attacker manages to gain access to the credentials used by Terraform, they are now able to mutate resources in Azure, but not only that, they are also the SQL Administrator on the production database. This goes against the Principle of Least Privilege and needs to be changed.

When looking at network firewall rules I identified that some of the developers who worked from home had popped their home IP address onto the allow lists on certain resources like production Key Vaults and SQL Servers. This is ok if those IP entries are regularly maintained and audited frequently, but there is still a latent risk that your infrastructure remains unnecessarily open to the internet.

I provided a list of known trusted DfE-only IP addresses, and suggested that all developers access the Azure resources via the DfE VPN clients. This means that we have a manageable permitted list, and it enforces that all traffic entering Azure, goes through the secure network boundary first.

Final thoughts

Overall the team had a strong foundation, thanks in part to our well-established Terraform module, but needed a dedicated ops person to provide that high level perspective of dev/sec/ops across the entire estate.

My final recommendations, crucial for any civil service team, are:

  1. Register with the NCSC Web Check service to keep informed about misconfigurations and vulnerabilities periodically. 
  2. If your service isn’t expecting to send mail, implement the recommended guidance from NCSC on protecting parked domains.

If you’re interested in reading more about what we’ve been up to in the TechOps space you can read about how we accelerated Terraform adoption across the Department for Education.