A Security Showdown in the Clouds: Comparing the Security Philosophies of GCP and AWS
As a long-time fan of Jimmy Neutron, I can’t help but to think about his Neutronic Storminator invention (from Season 2 Episode 14 — “Out, Darn Spotlight” for interested readers) while contemplating the content of this article. I would argue that the artificial storm cloud created by Boy Genius’s device in 2004 was a cartoonish metaphor for the surge in cloud computing that was to come.
We’re essentially witnessing a massive phase change of technology — a steady evaporation of data storage and computation out of their tangible, on-the-ground states into an ever-growing artificial cloud that continuously beckons with clear-cut business advantages. Continuing with this analogy, it’s important to note that evaporation isn’t disappearance; these things — computation, data storage, networking, identity and access management, etc. — still exist, but they’re taking on a new state, which means our responsibilities as technical professionals for managing them and keeping them secure must similarly take on a new state.
Part of this new-state responsibility is understanding how cloud service providers like Google and Amazon compare with specific regard to their philosophies on security. These vendors (among others) are the modern Jimmy Neutrons, each with their own powerful Neutronic Storminators spinning up their own clouds. Choosing one over the other can be the difference between a successful “MacBeth in Space” performance and a technological tornado of destruction (see the video at the beginning). Sure, the implications may not be quite that severe, but I wanted to extend the Jimmy Neutron reference as far as I could.
Ultimately, the business decision to build on or migrate to any cloud platform should be based on (among other factors) confidence in the security provided by that platform, because the responsibility for maintaining digital security in this new cloud computing frontier is a shared responsibility between the cloud provider and the customer.
Comparing Security Philosophies of AWS and GCP, By Key Questions
Before getting started with comparing the security of cloud platforms, we should first know what indicators are of interest; what questions should we be asking and what answers are we looking for? The following is a comparative analysis of AWS and GCP broken up by key security-related questions aggregated from a couple of sources focused on things to consider before hopping on the cloud wagon (as relaxing as that may sound).
How is encryption handled at rest and in transit? Does your platform offer a FIPS 140–2 Level certified Hardware Security Model? What is encrypted? What isn’t?
- According to this Data Encryption section of the Introduction to AWS Security Whitepaper, AWS offers: data at rest encryption capabilities in most services including EBS, S3, RDS, Lambda, and others; flexible encryption key management options; dedicated, hardware-based cryptographic key storage using CloudHSM for meeting compliance requirements which offers FIPS 140–2 validated cryptographic modules for regulation compliance as of March 2018; encrypted message queues for SQS (Simple Queue Service). Moreover, according to this Encrypting Data-at-Rest and -in-Transit whitepaper, AWS offers the ability to encrypt data both at rest and in transit in all of its services. The key thing to note from the documentation is that encryption of data is up to the customer.
- On the Google side of things, according to their whitepaper on encryption, all customer content is encrypted at rest without any action by the customer required. Fantastic! Moreover, all at-rest data is encrypted at the storage layer using AES-256, and a library called Tink is used to implement encryption across GCP in a consistent, FIPS 140–2 compliant way. As for data in transit, Google does apply some default protections, e.g., encrypting traffic between users and the Google Front End (GFE) with TLS, but they also offer customers the option to implement further protections for data in transit in the forms of IPSec tunnels, Gmail S/MIME, managed SSL certs, and Istio (for securing microservice communications in a K8s cluster, it seems).
How is RBAC implemented?
- AWS offers a specific Identity and Access Management (IAM) service for configuring role-based access, and it’s completely free. Initially, upon account creation, you start with a single root user which has full access to everything, but all documentation strongly recommends not using that root user for everyday tasks and instead using the free IAM service to create lower-privileged users with access to only the resources needed to perform their functions. You can create individual users in the IAM service (under the same account) corresponding to users (or applications) in your organization, and each user can have their own set of credentials as well as their own individual access key for programmatic AWS requests, or possibly just one of those (e.g., an application just needs an access key for programmatic access). So that’s the authentication portion. On the authorization side, you control a given IAM identity’s access to AWS resources via policy attachment, where most policies are stored as JSON documents. Generally, you would define (or select existing) roles representing job functions in your organization, attach policies to those roles that grant access to some AWS resource(s) relevant to the respective job function, and then assign that role to a user or group of users holding that job function. The role in essence represents some reusable group of permissions that can be assigned to any number of identities.
- Google Cloud Platform similarly offers an Identity and Access Management (IAM) service to serve the same function — that is, to allow GCP admins to define a set of identities and their respective access to resources within a GCP environment. Their IAM service also comes with a machine-learning-based tool called Recommender that (as expected) makes recommendations on access control to detect and fix overly permissive configurations. Access control can be time consuming, so this is a significant plus for GCP. The creation of users, groups, and roles in GCP works similarly to that in AWS; roles represent groups of permissions and can be assigned to “principals” (users, groups, domains, or service accounts) to grant them those permissions. Now, contrary to AWS, where normal IAM users can be assigned access keys for programmatic requests, in GCP, programmatic access must be handled with service accounts; only service accounts can be assigned programmatic access keys in GCP. However, the permissions on these accounts are handled just as they are for normal GCP users and groups — via role assignment.
Are you able to help us comply with data sovereignty regulations? Where do your datacenters physically live? Where do backup datacenters physically live?
- With AWS, as a customer, you can choose to store your data in any one (or more) of the AWS Regions, where a Region is a physical location around the world where data centers are clustered together. As of today (October 24, 2022), there are currently 27 launched regions each with multiple Availability Zones (AZs), where an AZ is one or more discrete data centers with high built-in redundancy. When you choose a region, AWS guarantees that your data will stay within the boundaries of that region. Moreover, Amazon prohibits remote access to any customer data by AWS personnel, including for service maintenance, unless specifically requested by the respective customer or if it is required to comply with law. Even in cases where law enforcement is involved, AWS challenges requests for customer data when they have grounds to do so, and they even provide a bi-annual report of all law-enforcement-related data requests they receive for customer data. One additional important note is that as of 2018, AWS offers admins the ability to use IAM policies to prevent users from provisioning resources in the wrong regions.
- On the Google side of things, customers still have the ability to control where their data is stored (for key services) via the selection of Regions, where each region contains multiple zones, similar to the AWS AZs. Moreover, GCP admins can also enable organization policies in conjunction with the IAM service to prevent users in the organization (or a given project) from accidentally storing data in the wrong region, which is similar to what is offered by AWS.
What mechanisms do you offer for helping us achieve, maintain, and monitor our compliance with regulatory and security policies?
- The AWS Risk and Compliance whitepaper is written with strong focus on the shared responsibility model. Internally, AWS has integrated a risk and compliance program throughout their organization to manage and continually reassess and improve risk in service design and deployment. They regularly undergo independent third-party attestation audits to provide current and prospective customers with assurance that services are functioning as intended. AWS also participates in the Cloud Security Alliance (CSA) Security, Trust & Assurance Registry (STAR) Self-Assessment to provide customers with additional assurance of its compliance with CSA best practices. The STAR homepage is prefixed with “the industry’s most powerful program for security assurance in the cloud”. So, that’s some of the vendor’s share of the shared responsibility model. On the other hand, the customer is certainly responsible for maintaining strong governance over their IT environment, which should involve identifying and understanding all compliance requirements that need to be met, establishing an environment that meets them, and validating and verifying that environment to mitigate risk. AWS offers a ton of compliance related documentation which they recommend that customers use in developing their own cloud control framework. AWS also offers AWS Systems Manager Compliance for monitoring compliance data and remediating issues, as well as AWS Artifact, which is a free self-service portal for accessing automated compliance reports about your IT environment.
- Similar to AWS, GCP also regularly undergoes independent third-party verification of its security, privacy, and compliance controls and provides certifications, reports, and attestations from those processes to customers for assurance. GCP’s compliance offerings are well-documented and easily searchable, with the ability to filter by region, by industry, or by focus area. For example, filtering by Industry=Education, I quickly found their FERPA (U.S.) (The Family Educational Rights and Privacy Act, 20 U.S.C. § 1232g; 34 CFR Part 99) offering, where I found that Google Workspace for Education can be used in compliance with FERPA. Google also offers Compliance Reports Manager, which provides customers with free, easy, on-demand access to critical compliance-related resources including third-party audit reports, certifications, and contract commitments. As with AWS, the customer’s share of the shared responsibility for maintaining compliance comes down to their establishment of strong governance, their understanding of their specific compliance requirements, and their consistent validation and verification of their IT environment against those requirements.
What mechanisms do you offer for ensuring business continuity? Are backups and disaster recovery a separate cost? When was the last time your platform’s disaster recovery mechanism was tested? How can your platform help us meet recovery time objectives (RTO) and recovery point objectives?
- In the AWS Whitepaper, Disaster Recovery of Workloads on AWS: Recovery in the Cloud, also based strongly on the shared responsibility model, AWS outlines best practices for addressing disaster recovery, mitigating risk, and meeting RTOs/RPOs for workloads. This is a bit of a rabbit hole, but ultimately, here’s the deal: AWS provides best practices for designing resilient systems (e.g., via the six-pillar AWS Well-Architected Framework) and recovering them from disaster, and they are responsible for the resiliency of the underlying Global Cloud Infrastructure supporting AWS services. Beyond that, the customer’s responsibility for resiliency and recovery is variable dependent on the services they use; they are ultimately in charge of leveraging that underlying infrastructure in a way that is “Well-Architected” to provide high availability and resiliency.
- GCP takes a similar stance based again on the shared responsibility model. Google is of course responsible for maintaining their underlying infrastructure on which their cloud services are built, but it’s up to each customer to architect their cloud environment and the governance around it with resiliency and disaster recovery (DR) plans in mind. Similar to AWS, GCP also provides a lot of documentation on disaster recovery planning walking customers through the process of designing with disaster in mind in order to deploy resilient systems.
What are your uptime guarantees (SLAs)?
- Both AWS and GCP have different SLAs documented for each service they provide (makes sense, right? SLAs are “service-level” after all). The SLAs for AWS are documented a bit more nicely than GCP’s SLAs in that the AWS documentation allows you to search, filter and sort. To give a specific AWS example, the AWS EC2 SLA (one of the most common services from AWS) is actually broken into two separate SLAs: one is a region-level SLA that applies to EC2 instances deployed across multiple regions or AZs, and the other applies to individual EC2 instances. Of course, the monthly uptime percentage for the first is higher than the latter (that’s the point of redundancy). To give a specific GCP example, the Compute Engine SLA is similarly broken down into multi-zone instances, a single instance, and load-balanced instances. Both GCP and AWS offer corresponding service credit percentages back to the customer in cases where the SLA is not met by given instances, and they both seem to use 95% as the low end for monthly uptime percentages on redundant environments. For redundant environments (multiple instances, multiple regions), if you drop below 95% monthly uptime, you’re eligible for 100% of the service credit used to run the affected instances.
What protections are in place for your data centers and infrastructure, physical and otherwise?
- According to their data center controls documentation, AWS puts a significant amount of thought, planning, and threat modeling into designing their data centers, which they claim are “secure by design”. Environmental assessments go into the initial site selection, including assessments of seismic activity, flood risk, etc. Redundancy is built in by deploying core apps to an N+1 design standard (for which I found further reading here). Availability is basically guaranteed through the replication of critical system components across multiple independent AZs. Capacity adapts flexibly based on AWS’s own monitoring of its service usage. AWS maintains its own Business Continuity Plan (BCP) to plan for and mitigate environmental disruptions, and even simulates disaster scenarios to produce reports of corrections that need to be made. The principle of least privilege is used for granting only approved employees time-bound access to specifically permitted areas of physical data centers upon request with a valid business justification. Third party access to physical data centers is treated similarly, but it must first be requested by authorized AWS employees. All physical access is logged, monitored, retained, and regularly reviewed, and there is extensive surveillance on physical data centers along with professional security staff and MFA-enabled access points. There are several other protections in place for AWS physical infrastructure that can be explored here: https://aws.amazon.com/compliance/data-center/controls/
- Google’s data centers are similarly protected with multiple layers of security. As they outline in their Data Center documentation, their DCs are secured with extensive surveillance systems, perimeter defense systems, biometric authentication (MFA), a 24/7 security staff, strict access policies and training for all appropriate staff, annual testing with simulated threat scenarios, and a proactive program centered on risk mitigation. Similar to AWS infrastructure, Google’s data centers are also designed with security and resiliency in mind.
The list of questions worth asking a cloud vendor before building on or migrating to their infrastructure is a list that should largely be driven by a security-oriented perspective on your own business’s requirements. Some businesses require a much greater degree of information protection and data encryption, some require much higher uptime guarantees, some more require more active monitoring and logging, etc. I’ve put together some additional questions at the end of the article to guide further investigation by the reader into cloud vendor approaches to security, whether for Google, Amazon, or another vendor entirely.
Overall, understanding the security implications of choosing one cloud vendor over another a) is important and b) largely comes down to exploring how the vendors align or diverge in their approach to the shared responsibility model. IT professionals responsible for managing IT infrastructure in the cloud need to understand what is in the scope of their own responsibility versus what the cloud vendor is responsible for regarding security. Knowing where that boundary lies, IT admins can create and enforce proper policies (e.g., these identities have access to these resources), procedures (e.g., every resource should be created in region X), and plans (e.g., this is how we can recover from a disaster…) to maintain a secure, scalable environment that will support the success and growth of the business overall.
More Cloud Questions to Consider
- What mechanisms do you have in place for protecting us from ourselves? In other words, what services do you offer for helping us detect vulnerabilities in our specific environment? For example, do you scan GitHub repositories and alert us to any exposed cloud platform secrets?
- How granular is the auditing provided by your platform? Can it be used to detect misuse?
- Which logging services are on by default? Do any logging services expose secret information by default?
- More generally, what services are on by default? Do any services use default credentials that need to be changed?
- What recent attack vectors have been exploited on your platform? What was done and what is being done to eliminate those threats?