Simply Explained: Amazon S3
Amazon Web Services (AWS) is behind much of the internet today. As Amazon’s most profitable division, AWS has grown to over 35 data centers grouped into multiple regions across the globe (US-East-1 in Ashburn, VA for example). Each datacenter region is structured into at least 3 availability zones (AZs); these AZs are intended to isolate failure of datacenters from each other to provide better and more predictable uptime for services. Better known AWS services include EC2, SNS, RDS, CloudFront and S3. EC2 provides virtual servers in the cloud, SNS allows you to automate mass communication via text and email or utilize 2FA login for your users via text or emailed based codes, RDS provides massively scalable relational databases, and CloudFront is a large-scale, distributed CDN that runs on AWS’s over 225 edge data centers. However, in this article, we will be focusing on one of AWS’s first services ever created: the Simple Storage Service.
In theory, S3 works very similarly to the more well-known cloud storage providers like Google Drive and Dropbox. However, when we start to unpack how S3 works on a more basic level, you will see that although all these services allow users to store files in the cloud, they are very different. Google Drive and Dropbox provide plans where users can choose how much storage they receive, often 100GB, 1TB, or even 10TB. However, with services such as S3, you only pay for the data you upload -- that could be anything from a 2MB picture of your cat to 100s of petabytes of satellite pictures. You are charged for the total size of your files in S3, no more (with few exceptions) and no less.
S3 is an object storage service oriented to enterprise customers who require terabytes, petabytes, and even exabytes of data to be accessible in milliseconds from anywhere around the world. When using S3, you get much more control of both where and how your data is stored in an AWS datacenter. Inside the S3 console, you can choose which region you want your files to be stored in, what durability of storage, and even the time that it takes to retrieve your files.
S3 Standard is the most popular option which allows millisecond-level access to all your data stored in multiple, geographically distinct AZs within one datacenter region. Similar options allow customers to reduce the durability of their objects by having AWS store their files in 1 AZ compared to 3. S3 - IA (infrequent access) allows users to keep their data in a high-durability and rapidly accessible environment with a simpler billing system when the data needs to be accessed. S3 Glacier and Glacier Deep Archive are services that are commonly used for storing essential backups. These services provide options for companies and individuals that just need to have the data stored in the cloud for compliance purposes or to have a more secure backup. Glacier Deep Archive is for files that need to be retained for many years and only need to be accessed around 1-2 times per year. Glacier and Glacier Deep Archive do have longer retrieval times compared to S3 Standard, however. Likely, data stored in these services are stored on archival tape drives. Glacier and Glacier Deep Archive files can be retrieved at times ranging from 1-2 minutes to up to 12 hours. Granted, you will pay a premium for an expedited retrieval of your data.
Billing for storing data in S3 is slightly more complicated than just how many bytes your data is. You do still pay for data by the byte though -- for example, S3 Standard costs $0.023 per GB-Month. Glacier and Glacier Deep Archive do cost less compared to Standard at $0.004 and $0.00099 GB-Month, respectively.
Every time you open a file in Google Drive, a GET request is forwarded to the origin server in Google’s data centers. This process is identical with AWS. However, Google Drive does not charge you for these application programming interface (API) requests. In addition to paying AWS per byte of data you store, you are also required to pay API charges. These API charges are separated into 2 main categories (this is still highly simplified): PUT/COPY/POST/LIST requests and all others. The first category of API generally changes the data stored in S3, such as PUT which is used when creating/uploading a new file. The other category consists of requests that are like GET. These requests are ones that need to read the files stored in S3, but generally do not change them. Glacier, Glacier Deep Archive, and S3 - IA charge egress fees based on how many outbound GBs of data are requested. For example, if I am storing a 100MB file in S3 - IA and I request it 10 times, that would combine to be 1GB of outbound traffic. This billing model simplifies requesting data that is stored in backups and infrequently used files. However, S3 Standard and S3 – One Zone (the reduced durability storage option) do use the API billing methodology.
S3 also offers enhanced security features compared to services like Google Drive. Every file that is uploaded to S3 has multiple security options. In addition to regulating if users on the greater internet can access it, you can also control which users inside the AWS account have access to this file and what they can do with it. Additionally, AWS Identity and Access Management (IAM) can regulate which users are able to perform which actions in S3, such as authorizing one user to only read the file whereas another can read, write, and rename the file. AWS Key Management System (KMS) allows users to encrypt files in S3 with a key managed by KMS, which is then encrypted with a master key that is rotated multiple times a day. Alternatively, users can upload their own keys. However, as with all forms of encryption, computing resources are needed to decrypt the data as well. Therefore, if you opt to encrypt your data in S3 with the KMS AES-256 bit key, you will pay extra for the retrieval of those files.
Finally, one major upside to using AWS compared to offerings such as B2 from Backblaze is the ability to transfer data from S3 to any AWS services in the same region. This feature can potentially make up for the increased cost of S3 compared to other cloud storage providers. If I am running a website off of an EC2 instance with my website files hosted in the same region’s S3, I only need to pay the EC2 internet egress fees whenever my website is loaded on someone’s computer. If I were using Backblaze to store my large, static files, I would need to pay for the egress fees from Backblaze to send the files to my EC2 instance and then pay the egress fees from EC2 to get the files from my server to your computer. This, however, may not make up for the almost 4x price increase of S3 compared to B2. Another example of how this may impact costs is in large ML instances. To process large Machine Learning models, incredible amounts of data are needed. Therefore, if a customer is using Sagemaker, they may benefit from the free “outbound” S3 data to Sagemaker.
While this article barely scratches the surface of how powerful and configurable AWS S3 is, I hope that it has provided a foundation of knowledge about the service so that you may continue learning about it.
To learn more, you can access the AWS S3 documentation through this link.