Simply Explained: Amazon S3

Simply Explained: Amazon S3
Photo by Patrick Lindenberg / Unsplash

Amazon Web Services (AWS) is behind much of the internet today. As Amazon’s most profitable division, AWS has grown to over 35 data centers grouped into multiple regions across the globe (US-East-1 in Ashburn, VA, for example). Each data center region is structured into at least three availability zones (AZs); these AZs are intended to isolate the failure of data centers from each other to provide better and more predictable uptime for services. Better-known AWS services include EC2, SNS, RDS, CloudFront, and S3. EC2 provides virtual servers in the cloud, SNS allows you to automate mass communication via text and email or utilize 2FA login for your users via text or emailed-based codes, RDS provides massively scalable relational databases, and CloudFront is a large-scale, distributed CDN that runs on AWS’s over 225 edge data centers. However, in this article, we will focus on one of AWS’s first services created: the Simple Storage Service.

In theory, S3 works similarly to the more well-known cloud storage providers like Google Drive and Dropbox. However, when we start to unpack how S3 works on a more fundamental level, you will see that although all these services allow users to store files in the cloud, they are very different. Google Drive and Dropbox provide plans where users can choose how much storage they receive, often 100GB, 1TB, or even 10TB. However, with services such as S3, you only pay for the data you upload -- that could be anything from a 2MB picture of your cat to 100s of petabytes of satellite pictures. You are charged for the total size of your files in S3, no more (with few exceptions) and no less.

S3 is an object storage service oriented to enterprise customers who require terabytes, petabytes, and even exabytes of data to be accessible in milliseconds from anywhere around the world. When using S3, you get much more control of both where and how your data is stored in an AWS data center. Inside the S3 console, you can choose which region you want your files to be stored in, what durability of storage, and even the time that it takes to retrieve your files.

S3 Standard is the most popular option which allows millisecond-level access to all your data stored in multiple, geographically distinct AZs within one datacenter region. Similar options enable customers to reduce the durability of their objects by having AWS store their files in 1 AZ compared to 3. S3 - IA (infrequent access) allows users to keep their data in a high-durability and rapidly accessible environment with a more straightforward billing system when the data needs to be accessed. S3 Glacier and Glacier Deep Archive are services commonly used to store essential backups. These services provide options for companies and individuals needing to have the data stored in the cloud for compliance purposes or a more secure backup. Glacier Deep Archive is for files that need to be retained for many years and only need to be accessed around 1-2 times per year. However, Glacier and Glacier Deep Archive have longer retrieval times compared to S3 Standard. Likely, data stored in these services are held on archival tape drives. Glacier and Glacier Deep Archive files can be retrieved from 1-2 minutes to up to 12 hours. Granted, you will pay a premium for expedited data retrieval.

Billing for storing data in S3 is slightly more complicated than just how many bytes your data is. You still pay for data by the byte, though -- for example, S3 Standard costs $0.023 per GB-Month. Glacier and Glacier Deep Archive cost less than Standard at $0.004 and $0.00099 GB-Month, respectively.

Every time you open a file in Google Drive, a GET request is forwarded to the origin server in Google’s data centers. This process is identical to AWS. However, Google Drive does not charge you for these application programming interface (API) requests. In addition to paying AWS per byte of data you store, you must also pay API charges. These API charges are separated into 2 main categories (this is still highly simplified): PUT/COPY/POST/LIST requests and all others. The first category of API generally changes the data stored in S3, such as PUT, which is used when creating/uploading a new file. The other category consists of requests that are like GET. These requests need to read the files stored in S3 but generally do not change them. Glacier, Glacier Deep Archive, and S3 - IA charge egress fees based on how many outbound GBs of data are requested. For example, if I store a 100MB file in S3 - IA and request it 10 times, that would combine to be 1GB of outbound traffic. This billing model simplifies requesting data stored in backups and infrequently used files. However, S3 Standard and S3 – One Zone (the reduced durability storage option) use the API billing methodology.

S3 also offers enhanced security features compared to services like Google Drive. Every file that is uploaded to S3 has multiple security options. In addition to regulating if users on the greater internet can access it, you can also control which users inside the AWS account have access to this file and what they can do with it. Additionally, AWS Identity and Access Management (IAM) can regulate which users can perform which actions in S3, such as authorizing one user to only read the file, whereas another can read, write, and rename the file. AWS Key Management System (KMS) allows users to encrypt files in S3 with a key managed by KMS, which is then encrypted with a master key that is rotated multiple times daily. Alternatively, users can upload their own keys. However, as with all forms of encryption, computing resources are also needed to decrypt the data. Therefore, if you encrypt your data in S3 with the KMS AES-256 bit key, you will pay extra to retrieve those files.

Finally, one major upside to using AWS compared to offerings such as B2 from Backblaze is the ability to transfer data from S3 to any AWS services in the same region. This feature can potentially compensate for the increased cost of S3 compared to other cloud storage providers. If I run a website off of an EC2 instance with my website files hosted in the same region’s S3, I only need to pay the EC2 internet egress fees whenever my website is loaded on someone’s computer. If I were using Backblaze to store my large, static files, I would need to pay for the egress fees from Backblaze to send the files to my EC2 instance and then pay the egress fees from EC2 to get the files from my server to your computer. This, however, may not make up for the almost 4x price increase of S3 compared to B2. Another example of how this may impact costs is in large ML instances. To process large Machine Learning models, incredible amounts of data are needed. Therefore, if a customer is using Sagemaker, they may benefit from the free “outbound” S3 data to Sagemaker.

While this article barely scratches the surface of how powerful and configurable AWS S3 is, I hope You can access the AWS S3 documentation through this link to learn more as it provides a foundation of knowledge about the service so that you may continue learning about it.

To learn more, you can access the AWS S3 documentation through this link.