⭐ AWS Quickstart¢

Amazon Web Services is a subsidiary of Amazon providing on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. Through the Development Data Partnership, Data Partners often share datasets via the AWS cloud with the support of the World Bank’s and your respective institution’s IT cloud team.

In that spirit, here are guidelines on how to access and retrive data from AWS services, such as AWS S3 and AWS SageMaker.

Caution

Please remember you must abide to the terms of the Master Data License Agreement. In case you have any questions or need any clarifications, please reach out to us.

CredentialsΒΆ

Your team will receive AWS credentials (key and secret) that help you to be authenticated (signed in) and authorized (has permissions) to use resources. Please note that support, including provisioning additional compute resources and estimating costs, is provided by your respective institution’s IT cloud team.

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

Caution

The credentials are your nominal, non-shareable, non-transferable access to AWS resources. Never publish or commit your key and secret. Remember you are responsible for the use of the access granted to you.

AWS S3ΒΆ

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage.

Let’s see a few options on how to retrieve data stored on AWS S3.

Using AWS CLIΒΆ

The AWS Command Line Interface (CLI) is a tool to manage AWS services, including AWS S3, when authenticated with your IAM credentials.

After installing and configuring according to the instructions provided by AWS, you can execute operations with your credentials.

With the example of data provived by Waze for Cities for Myanmar and stored on a the World Bank-owned s3://wbg-waze/ bucket, using AWS CLI, you can:

  • List available data

    aws s3 ls --recursive "s3://wbg-waze/myanmar/*"
    
  • Check size

      aws s3 ls --recursive --summarize --human-readable "s3://wbg-waze/myanmar/*"
    
  • Copy data from the cloud to local filesystem

    aws s3 cp --recursive "s3://wbg-waze/myanmar" .
    

Using CyberduckΒΆ

If you prefer a graphical interface to manage data, we encourage using Cyberduck. Cyberduck is a free cloud storage browser that allows you to explore files as if they were on your local filesystem.

Using DaskΒΆ

If you are familiar with the PyData stack, a convenient way to access data within a Jupyter notebook or a Python script is with Dask (via s3fs).

First install the dependencies, Dask and s3fs, using pip or conda,

pip install "dask[complete]" s3fs

Now, on the console, you can read the data with only one line of code. Magic!

import dask.dataframe as dd

df = dd.read_csv('s3://bucket/path/to/data-*.csv')

If using a named profile, you can pass it as an argument (or export environment variable AWS_PROFILE).

# Options passed to s3fs.S3FileSystem
storage_options=dict(profile="named-profile"))

df = dd.read_csv('s3://bucket/path/to/data-*.csv', storage_options=storage_options)