Uploading a directory to Amazon S3

Wednesday, Jul 30 2014 in devops

If you want to translate a directory structure on a disk to Amazon S3, you need to use the file path as the object key for the file. For instance, if you want to obtain the path /bar/foo/baz.jpg, you need to store ‘baz.jpg’ with an object key of ‘bar/foo/baz.jpg’.

In the administration console, it is possible to create “folders”. There is no such thing in the API. Amazon has decided to introduce this concept in their interface. Apparently Amazon anticipated that someone might organise files by hand; it’s strange, because overall S3 is definitely not targeted at end users.

Where you might care about subdirectories is if you’re hosting your static website on S3 and you want to use directories to organize URLs. But there is no such thing as directories in S3.

In S3 there are buckets and keys. Each bucket contains an object identified by a key. If the key contains slashes, the strings between the slashes will be displayed as nested folders in the interface.

For static website hosting, S3 will match the URL to the key.

Below is some sample Python code to upload a whole directory with the boto Amazon Web Services client library.

from boto.s3.connection import S3Connection
from boto.s3.key import Key
import os

conn = S3Connection()

bucket = conn.get_bucket('ludovf.net')

for root, dirs, files in os.walk('_site'):
    for name in files:
        path = root.split(os.path.sep)[1:]
        path.append(name)
        key_id = os.path.join(*path)
        k = Key(bucket)
        k.key = key_id
        k.set_contents_from_filename(os.path.join(root, name))

Of course, this code could be improved by uploading only those files that have changed since the last time they have been uploaded to S3.