The one year anniversary of Capital One’s data breach is rapidly approaching. Therefore, I thought it was a good time to review the lessons we can take from the breach in order to prevent it from happening in the future.
Information about the breach and the person responsible is abundant and has been beaten into the ground in its entirety by countless blog posts and news articles, so rather than having that discussion again, let’s dive into what happened from a technical standpoint. As a general note, some of the pieces here are extrapolations and assumptions I pulled from the indictment, but I tried to pick the most likely scenarios of what happened.
- An EC2 instance running WAF software (ModSecurity) was compromised in Capital One’s account via SSRF.
- The AWS metadata service was queried, and it returned information about a role that was attached to the instance as well as the access token for the role.
- The role was titled *-WAF-* but had unnecessary permissions to read from S3 (and most likely kms:decrypt as well).
- A bucket was found via the role that had access to a sensitive S3 bucket.
- Data was decrypted and exfiltrated from the account.
On an editorial note, this feels like the next “S3 exposure” type of event; however, there were a couple of key differences. Back in 2017/2018, when publicly exposed S3 buckets started popping up everywhere, there were two main issues.
- At scale, enterprises just couldn’t keep a handle on how all of their buckets were configured and they were struggling with a lack of visibility.
- The “Authenticated Users” bucket ACL was much more permissive than people thought (it was all logged in AWS users that would have access to a bucket instead of just authenticated users to the local account), so people were essentially making buckets wide open because of poor naming on AWS’ part and a lack of understanding of what each ACL did.
After a slew of companies were featured for bucket leaks in the news, AWS took some initial and longer-term steps to help their customers keep their buckets configured as they intended. First, they added the “Public” icon to each bucket so that people had a quick and visually contextual view from the console of their exposure. Later, AWS added in the “block public access” feature, which helps prevent the issue from happening in the first place (again, if configured properly…).
With the Capital One breach, things felt very similar and yet different at the same time. First, a company exposes data due to a cloud misconfiguration. Next, AWS responds by adding more security features to the area that was compromised (new metadata service in this case). After that, people try to configure their environments more securely but don’t always know if they have succeeded in doing so. Lastly, this breach becomes the main topic of conversation when talking about cloud misconfigurations and will continue to be until the next big breach.
With this particular breach behind us (but not forgotten), it is worth exploring how we can set up our cloud environments to prevent issues like this in the future. Like the S3 leaks, AWS has added some new capabilities that can help. Nonetheless, the responsibility for preventing the issue still lies solely with their customers, and there are various resources that are prone to misconfigurations.
What You Can Do Today
- Make sure your web-facing software is configured correctly. The original entrypoint into the Capital One system was through a compromised open-source WAF called ModSecurity. Whether this was a software vulnerability or just a misconfiguration doesn’t really matter. Either way, software needs to be checked for potential “soft spots” and remediated immediately. If the entry point into the account didn’t exist in the first place, then everything beyond this wouldn’t have mattered.
- Stop using v1 metadata service. After information about the breach was released, AWS released a new version of their metadata service that is more secure than v1.
- All instances should be only running the v2 metadata service where possible. You can set it using this command: aws ec2 modify-instance-metadata-options –instance-id <INSTANCE-ID> –profile <AWS_PROFILE> –http-endpoint enabled –http-token required
- There is a metric in CloudWatch that shows where the v1 metadata service is being used. Look for the MetadataNoToken metric to see what instances are still using v1 so you can assess what it’ll take to get them updated and hopefully get this metric to 0. I’d recommend looking at this metric all across instances and also by each instance.
- Check your instance roles. Do all instances need roles? At a minimum, do the names make sense? “Is all of IAM configured as least privilege?” is a hugely loaded question in the AWS world today. IAM is an extremely complex service that is difficult to manage, especially at scale. For this misconfiguration though, a great place to start is looking at instances and determining if they have any role attached and if the name of the role makes sense in the context of the instance to which it is attached.
- Restrict bucket access and permissions. Thankfully, this bucket was not wide open to the internet, but bucket access still played a big part in this breach and there’s always more that you can do to lock down permissions further.
- Encrypt data. This data was encrypted, but it’s still worth mentioning again.
- Enable logging. This is more of a post-mortem/forensic thing to do, but especially for extremely sensitive buckets, ensure that you have bucket, object, etc. logging turned on.
- Tokenization. Depending on the sensitivity of the data and how it is used, look to embrace a combination of tokenization and encryption. In this breach, encryption was bypassed with the permissive IAM role, but tokenization was not. This helped keep the breach from going from bad to worse.
- Scope KMS access using conditions. AWS permissions are convoluted at best, but there is a lot of power in the system if configured properly. Using KMS conditions, you can scope down access to use in only specific VPCs, accounts, etc. If this was set, then the data still would have been exfiltrated, but it would have stayed encrypted.
The hardest and most important piece of this whole breach is also the most nebulous and difficult to perfect. Some high-level things to evaluate:
- Wildcards. Wildcards are easy to use and are a quick way to make things “just work,” but they should be avoided wherever possible. Think of wildcards like the “chmod 777” of the cloud world. Just because you can, doesn’t mean you should.
- Unique roles. Each application should have their own unique role, and roles should not be reused. Like the wildcards, reusing roles can be an easy way to circumvent the hassle of creating new, scoped down, permissions, but scoping access to only exactly what a resource needs will ensure that you can limit the blast radius of any compromised resource.
- Configure conditionals where possible. Using IAM conditions is a great way to ensure that permissions are only used where they’re supposed to. Configuring these permissions can be tedious, but the time spent will pay dividends in your “defense in depth” strategy.
- Look for roles that can also decrypt data and review those with extra scrutiny (don’t rely on just the KMS use policy). Capital One has a standard to encrypt all data, but the compromised role had access to read and decrypt data from S3. This is like locking your house but taping the key to the front door. KMS conditionals would have helped, but if that role didn’t have KMS permissions attached in the first place, then they wouldn’t have been able to decrypt the data.
- Macie. After much anticipation, a new version of AWS released an updated version of Macie. This version is more cost effective and also seems to have the features that legacy customers were missing. While Macie isn’t made to look for data exfiltration, it does specialize in looking for buckets with sensitive data in them. Even if you have a list of where your sensitive data lies, it’s a good idea to run Macie across your environments to make sure you don’t have sensitive data living in buckets that don’t have the high level of security needed.
- CloudTrail. Like bucket logging, this wouldn’t have prevented the breach, but it’s still an important thing to have turned on for all accounts. You want at least one multi-region trail per account that rolls up to a master account for storage. This central audit trail of what happened is invaluable for forensics.
At DivvyCloud, we specialize in providing visibility and security to your public cloud environments. One of the most basic things we do is keep cloud security management simple, even as the scale and complexity of your environments increases. For the recommendations I listed above, making these changes in one account is doable, but it becomes overwhelming if you have tens or hundreds of accounts. Here are a couple of critical places where DivvyCloud can help harden your environment against a Capital One style breach, even if you’re running a massive environment.
- Stop using v1 metadata service.
- Create an insight using the “Instance Allows Use Of Vulnerable IMDSv1 Protocol (AWS)” filter. You can correlate this information with CloudWatch metrics to show if the metadata service is even being used and help prioritize what instances can be converted immediately and which need to have their code refactored first to ensure continuity with the new service.
- In the resources page, apply the “Instance Associated With Role” filter.
- Look at the instances that are output: do the role names even make sense at a high level for the instance they’re attached to? Reviewing these roles in a central location can help pick off obvious misconfigurations.
- Blacklist certain permissions and check to see if the instances have them.
- Create a data collection and add all of the permissions you don’t want to see (s3:*, ec2:*, kms, etc.)
- Select the “Resource Associated Role Contains Action” filter and in the “Target actions” section, select the data collection you created.
- Save this as an insight and create a bot so you can be alerted if a new, permissive, instance is created.
- Create an insight from the “Storage Container Without Preventative Public Access Enforcement” filter to ensure the public access blocks are set for your buckets.
- Create an insight from the “Storage Container Does Not Have Bucket Policy” filter. More review is needed to ensure that the bucket policies are actually what you want. However, making sure that your buckets have a policy at all is a good first step.
- Use the “Storage Container Without Server Side Encryption Enabled” insight to make sure the buckets have SSE enabled.
- Use the “Storage Container Without Access Logging” insight to make sure you have logging turned on for your buckets.
- Follow a naming or tagging convention to identify sensitive buckets. Create an insight that looks for this information so you can ensure you know what the sensitivity is of each bucket.
- Use the “Identity Resource With Wildcard Access (*:*)” filter to alert on users, policies, and roles with wildcards.
- Look for places that can also decrypt data and review those with more scrutiny.
- Use the “Identity Resource Contains Invalid Actions” filter with “KMS%” as the blacklisted action to look for. Review these findings to make sure that KMS permissions don’t exist anywhere they shouldn’t.
- Use the “Cloud Account Without Global API Accounting Config” insight to ensure you have CloudTrail tuned on across your account.
Unlike the binary misconfiguration of public S3 buckets, there was no single issue that caused the Capital One breach. If any one thing had been just a little bit different, we wouldn’t be talking about Capital One today (and probably would still be on the topic of public buckets…). This was a complex chain of events and misconfigurations that allowed the bucket data to be compromised. If the WAF didn’t allow SSRF, the metadata service would never have been hit. If the v2 metadata service existed at the time, it would have ignored the request coming from the WAF. If the instance had either no role or an appropriately scoped role, they wouldn’t have been able to get anywhere important after compromising the instance and role. If KMS access and S3 access were restricted at the resource level, the actions would have been denied because they were called from outside of Capital One accounts. With just one adjustment to any one of these issues, the data would still be secure.
I’ve yet to see a perfectly configured enterprise environment. While companies continue to improve their cloud security, there are just too many opportunities to get things wrong. Until cloud misconfigurations are a thing of the past, we need to take a “defense in depth” strategy. If we configure every service like it’s the last line of defense, we’ll be able to break the chain of cascading failures and ensure that our customer data stays inside of our walls.