How to Upload Data to Amazon S3, and From S3 to Glacier

par | 21 octobre 2024

Organizations who process large amounts of data and large files, and who also use Amazon S3 storage, need a simple way to get data into S3, and then from S3 to Glacier for data archiving and other cold storage purposes.

But this is easier said than done—especially for IT teams managing multiple storage destinations, permissions, and other configurations for potentially hundreds of users:

  • IT teams must manually configure access to infrequent access (IA) storage for each user, escalating the chances of misconfigurations, unauthorized access, or “hot” data being placed in cold storage.
  • Natively uploading content to AWS S3 and S3 Glacier means dealing with strict file size limitations and technically challenging upload processes such as multipart upload via the AWS Command Line Interface (CLI).
  • Natively uploading content to S3 and Glacier is typically a manual, time-consuming process.

All of the above can lead to a state of management and administration hell for IT teams. That said, here’s a brief tutorial on uploading data from S3 to Glacier using the AWS Management Console and CLI—along with why MASV is a better option to upload data to any version of S3.

Table des matières

Ingest Large Files Into Cloud Cold Storage Fast

MASV integrates with multiple cloud cold storage platforms and can automate storage workflows.

What is Amazon S3?

Amazon Simple Storage Service (S3) is an object storage service with powerful scalability, security, data availability, and performance. Because object storage is designed to house unstructured data, Amazon S3 storage is popular among M&E companies who work with tons of  images and videos.

💡 MASV’s no-code integrations with several major cloud providers, including S3, allow hands-free content ingest into cloud storage and file transfer and storage automations.

S3 storage classes

S3 isn’t a monolith, however: It contains several different classes de stockage tailored to different use cases and data storage requirements. These storage classes include:

  • S3 Standard
  • S3 Intelligent-Tiering
  • S3 Express One Zone
  • S3 Standard-IA
  • S3 One Zone-IA
  • S3 Glacier Instant Retrieval
  • S3 Glacier Flexible Retrieval
  • S3 Glacier Deep Archive
  • S3 Outposts
Image de remplacement

Each S3 storage class has differing storage and data egress costs and performance, with infrequent access (IA) and cold (Glacier) storage options featuring cheaper storage but more expensive (and less performant) data egress. Some Glacier storage options take several hours or even days to retrieve data.

For an analysis of various S3 storage classes, their cost, performance, and suitability for various use cases—specifically around S3 Glacier storage classes—see our post on setting up and managing a cloud archive.

Archives en nuage

L'archivage dans le nuage : Avantages, défis et meilleures pratiques

What are some of the top cloud archive best practices you should follow? We’ve got you covered.

Lire la suite >

What is Amazon S3 Glacier?

Amazon S3 Glacier is a type of S3 storage—specifically, cold storage used for less frequently accessed data, such as cloud archive data.

Amazon Glacier and other cold storage options, such as Google Coldline or Azure Blob Storage’s Cold Tier, offer several benefits when storing infrequent access (IA) data, such as:

  • Frees up space in your primary (hot) storage by removing inactive data.
  • Less expensive storage costs than hot (frequently accessed) data storage.
  • Offers a lower cost alternative to physical archives while allowing businesses to keep

But while there are plenty of advantages of cold storage, it’s also important to remember to closely monitor who has access to AWS Glacier from your team.

That’s because while Glacier offers cheaper storage costs than other storage types, as mentioned, it can be much more expensive to egress data from cold storage (and it takes longer than retrieving data from hot storage).

How to Migrate Data To S3, and From S3 to Glacier

First off: We always recommend using either the CLI or infrastructure-as-code (IaC) to configure and manage your Amazon storage, because it allows for greater control and better oversight over potentially devastating misconfigurations. It’s easy when using the Management Console to create a set of cascading problems through one or two misplaced clicks.

But we also recognize that non-technical and newer users may prefer the AWS Management Console.

So, here’s how to migrate your data from S3 to Glacier using the Management Console.

Management Console

Part 1: Set up an S3 Bucket

  1. Create an AWS account and log into the Management Console.
  2. Using the search bar, search for S3. Select S3 from the results.
  3. On the left-hand menu, select Buckets. Name your new Bucket (we’ll call it “MASV Archive”) and select your preferred region. Ensure the default Block Public Access is selected.
  4. Activer bucket versioning and (optional) add tags to track storage costs or other criteria.
  5. Enable at-rest encryption.
  6. Under Advanced Settings, consider enabling S3 Object Lock (optional) to ensure important data doesn’t get deleted (if you’re creating the bucket for archive purposes, for example).
    1. Once the bucket is created and if you have enabled Object Lock, select bucket details, which opens the bucket’s Properties onglet.
    2. Select the Modifier button in the Object Lock section of the Properties tab. This enables you to select default values for data uploaded to the bucket (ie. to retain data for X number of years).

Part 2: Upload to S3 and select your storage class

  1. Select your bucket in the S3 console. From there, hit the Objects tab and select Télécharger.

💡 You can upload a single object up to 160GB to S3 using the console. To upload a file larger than 160GB you will need to use the AWS CLI, AWS SDK, or Amazon S3 REST API. Or you can use MASV to upload files of up to 5TB to S3.

2. Choose the Add files button, then navigate to the files you want to upload. The file will appear in the Files and folders section of S3.

3. Under the bucket’s Properties section, you can then select the S3 storage class (such as Glacier Deep Archive or Glacier Instant Retrieval) you want to upload your bucket to.

Note: Some storage classes have minimum durations for data uploaded; for example, Glacier Deep Archive bills for 180 days even if the file is deleted after just a few days.

4. You’ll next see a file upload status banner. Once the upload is complete, you’ll see an upload summary.

Congratulations! You have successfully uploaded a file from your computer to S3, and from S3 to Glacier, using the AWS Management Console.

Command Line Interface (CLI)

Now let’s try setting up the same process using the AWS CLI. In this example, we’ll create a bucket named masv-archive using the AWS command line tool.

This bucket will follow a few best practices for security: Public access blocked, versioning on, at-rest encryption on, object lock on. To track this archive for project management and billing, we’ll add a tag, project-x. Then we’ll upload a small file and a large file from local storage to S3 Glacier.

While the AWS CLI is more flexible than Management Console in that it can support uploads larger than 160GB, it’s got a 5 GB single object PUT limit, necessitating multipart uploads.

Before you begin, make sure your credentials and region are already configured for the AWS CLI. We’ll use the masv profile in this example.

Let’s create and configure the bucket. You only need to do this once.

1. Create the bucket:

$ aws s3api create-bucket \\
    --bucket masv-archive \\
    --region $(aws configure get region --profile masv) \\
    --object-lock-enabled-for-bucket \\
    --profile masv 

Output:

{
    "Location": "/masv-archive"
}

2. Block public access:

$ aws s3api put-public-access-block \
    --bucket masv-archive \
    --public-access-block-configuration 
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true \
    --profile masv

3. Turn on versioning:

$ aws s3api put-bucket-versioning \\
    --bucket masv-archive \\
    --versioning-configuration Status=Enabled \\
    --profile masv

4. Turn on server-side encryption:

$ aws s3api put-bucket-encryption \
    --bucket masv-archive \
    --server-side-encryption-configuration '{
        "Rules": [{
            "ApplyServerSideEncryptionByDefault": {
                "SSEAlgorithm": "AES256"
            }
        }]
    }' \
    --profile masv

5. Add a tag named “project-x”:

$ aws s3api put-bucket-tagging \
    --bucket masv-archive \
    --tagging 'TagSet=[{Key=project-x,Value=true}]' \
    --profile masv

Now the bucket is ready for storing our archived files. We’ll upload from local storage to our bucket with the Glacier storage class.

Uploading small files, less than 5 GB, is straightforward. It takes just one command. For example, if your local file is named my-small-file.mp4, then you would enter this:

$ aws s3api put-object \
    --bucket masv-archive \
    --key my-small-file.mp4 \
    --body my-small-file.mp4 \
    --storage-class GLACIER \
    --profile masv

Output:

{
    "ETag": "\"e5x2a5mbpdl4e2d4c3549862b2a5f2b\"",
    "ServerSideEncryption": "AES256",
    "VersionId": "eixDahmmp.lreyCa8cKkkHoV80r17S8k"
}

Multipart uploads via CLI

Uploading files larger than 5 GB using CLI requires multiple steps, because it doesn’t support uploading files greater than 5 GB directly.

Instead, you need to break it up into parts on your local computer and upload each part individually. AWS calls this multipart uploading.
In this example, we’ll keep it simple with a 10 GB file that only needs to be broken up into two parts.

1. Get an upload ID. We need this upload ID for each part that we upload in later steps.

$ uploadId=$(aws s3api create-multipart-upload \
    --bucket masv-archive \
    --key my-large-file.mp4 \
    --storage-class GLACIER \
    --query UploadId \
    --output text \
    --profile masv)

2. Upload the first 5 GB part of the file.

$ head --bytes 1073741824 my-large-file.mp4 > my-large-file.mp4.part1
$ aws s3api upload-part \
    --bucket masv-archive \
    --key my-large-file.mp4 \
    --part-number 1 \
    --body my-large-file.mp4.part1 \
    --upload-id "$uploadId" \
    --profile masv

We need to remember the ETag value in the output:

{
    "ServerSideEncryption": "AES256",
    "ETag": "\"e5x5adm2pflee9ed4c3549862b2a5f2b\""
}

3. Upload the last part of the file.

$ tail --bytes 1073741825 my-large-file.mp4 > my-large-file.mp4.part2
$ aws s3api upload-part \
    --bucket masv-archive \
    --key my-large-file.mp4 \
    --part-number 2 \
    --body my-large-file.mp4.part2 \
    --upload-id "$uploadId" \
    --profile masv

4. Finalize the multipart upload and clean up.

$ aws s3api complete-multipart-upload \
    --bucket my-archive \
    --key my-large-file.mp4 \
    --upload-id $upload_id \
    --multipart-upload '{"Parts":[{"ETag":"e5x5adm2pflee9ed4c3549862b2a5f2b","Part
Number":1}, {"ETag":"5e5xda2mfpel9ede4c534986ba225b2f","PartNumber":2} ]}' \ --profile masv $ rm my-large-file.mp4.part1 my-large-file.mp4.part2

At this point, the file is safely stored in S3 Glacier.

As you can guess, this example for a multipart upload isn’t the simplest way to transfer large files to S3. It isn’t the fastest way either, even for small files.

To improve performance, you can tune the size of each part and upload multiple parts at a time, but that’s beyond the scope of this example.

Multipart uploading does have other advantages, however, like faster recovery from errors and resuming paused or interrupted uploads.

Managing S3 Glacier: Tips and Best Practices

So, you’ve got your files uploaded to S3 Glacier. Great! Now what?

At MASV, we’ve got plenty of experience managing a variety of S3 and other cloud instances. Here are some of our top tips for managing S3 Glacier in particular:

  • Using a third-party file upload tool with no-code S3 integrations, such as Portails MASV, can help simplify and speed up the S3 upload process while allowing users to upload up to 5TB per file at once.
  • As mentioned, although the example above uses the Management Console, if you have the technical expertise we recommend managing all AWS infrastructure through AWS CLI or with IaC.
    • You can upload larger files using these methods.
    • We recommend going a bit further and configuring your AWS Management Console to read only. This ensures a single employee can’t make potentially catastrophic changes without approvals from other team members.
    • You can draw up “break glass” policies for console use during emergencies.
Image de remplacement
  • Ideally involve a team of three when setting up and configuring S3 Glacier, or at least two people minimum (just one person would likely be a bottleneck).
    • Teams working on configuring cloud buckets should be able to verify each others’ work.
    • The team can be responsible for configuration and management, along with ensuring compliance and performing data classification to ensure the right data goes into archive.
    • The team’s job will be to protect security, data integrity, and cost.
  • Implement a regimented process that involves several different stakeholders when making any changes to your AWS buckets.
    • This helps prevent configuration drift, which is easier to do when using Management Console because you can change something with one click (which could lead to problems down the road).
    • From a compliance standpoint, any infrastructure changes need to go through a change management process requiring second approvals.
  • Implement a regimented process for data ingestion, and only allow authorized users access to specific storage buckets (don’t give everyday business users access to cold storage, for example, as they could ingest hot data—leading to needlessly expensive and time-consuming data egress).

MASV: The Easy Way to Upload Massive Files to S3 Glacier

Parce que flux de travail de post-production require data to be saved in multiple locations (such as on-prem device storage for video editing and the cloud for backup and archive), configuring S3 Glacier (and uploading from S3 to Glacier) is typically part of a larger content ingest workflow setup process that involves both IT and operations staff.

  • But setting up, configuring, and managing S3 Glacier within a larger workflow requires staff able to build and maintain cloud infrastructure while managing multiple storage destinations and users with varying rules and permissions. This can lead to potentially devastating misconfigurations.
  • Staff must also be able to wire up workflows to make the data accessible to those in the organization with privileged access.
  • Even after all that’s done, native uploads to S3 and S3 Glacier have relatively strict file size limitations, can be slow, and often require technical expertise, which can make uploads needlessly time-consuming.

Developers can get around these limitations by spending months creating their own uploader for S3—or they can use MASV Ingestion centralisée as a central, secure point of data ingest to cloud and networked on-prem storage and save teams hours of configuration and setup time.

Teams can use MASV to facilitate stakeholder uploads to multiple destinations at once, including a range of S3 Storage Classes, without having to grant stakeholders privileged access. When you set a Storage Class for a MASV integration, you eliminate user error and reduce the risk of storage cost overruns.

Sign up for MASV’s free tier and get 10GB free every month to flex your storage workflows.

Envoyez des fichiers vidéo volumineux en toute confiance

Inscrivez-vous à MASV et commencez à envoyer et à recevoir des fichiers volumineux rapidement et en toute sécurité.