Main Content

Transfer Data to Amazon S3 Buckets and Access Data Using MATLAB

To work with data in the cloud, you can upload it to Amazon® Simple Storage Service (Amazon S3™) and then access the data in Amazon S3 from MATLAB® or from workers in your cluster.

You can either read or write Amazon S3 data from your MATLAB session. Your MATLAB session can be anywhere, including your local machine, MATLAB Online™, or your cloud resource in Cloud Center.

Set up Access to Amazon S3 Bucket

You must set up Amazon Web Services (AWS®) credentials to work with remote data in Amazon S3. You must also ensure that these AWS credentials have the required read and write policies. If you are creating resources on Cloud Center, you can either add AWS access before creating the resource or while the resource is running.

Add AWS Access Before Creating Cloud Resources in Cloud Center

If you are creating a MATLAB or MATLAB Parallel Server™ resource in Cloud Center, you can set up access to read data from:

  • The S3 buckets in the AWS account linked to your Cloud Center account from which you are creating a cloud resource.

  • Public S3 buckets.

To do this, you need to add the required AWS Identity, Access and Management (IAM) policy by setting Additional IAM Policies (Optional) to the value arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess when creating your cloud resource.

To also set up write access to the S3 buckets in the AWS account linked to your Cloud Center from which you are creating the cloud resource, enter the value arn:aws:iam::aws:policy/AmazonS3FullAccess instead.

Add AWS Access in Your MATLAB Session

If you already started your cloud resource in Cloud Center, you must use an AWS session token instead. This also applies to your MATLAB session on your local machine or MATLAB Online. You must also use this approach if you want to access an S3 bucket in an AWS account that is different from the one you used to create a cloud resource in Cloud Center.

If you are a root user for the AWS account, follow these steps. Otherwise, contact your AWS account administrator. If you are provided with a long-term token, skip to step 3. If you are instead provided an AWS session token instead, skip to step 6. Ensure that your token has the required S3 read (AmazonS3ReadOnlyAccess) or write (e.g. AmazonS3FullAccess) policy that you need.

  1. Create an identity and access management (IAM) user using the AWS account that contains the S3 bucket. For more information, see Creating an IAM User in Your AWS Account.

  2. Generate an access key to receive a long-term access token that includes an access key ID (AWS_ACCESS_KEY_ID) and a secret access key (AWS_SECRET_ACCESS_KEY). For more information, see Managing Access Keys for IAM Users. Ensure that this access key has the required S3 access policies. This access key allows you to generate an AWS session token.

  3. Download and install the AWS Command Line Interface tool on the machine with your MATLAB instance. This tool supports commands specific to AWS in your system terminal.

  4. In the system terminal, enter this command to set up the AWS CLI. You are prompted to enter the details of your long-term access token.

    aws configure
  5. To obtain an AWS session token, enter this command in your system terminal.

    aws sts get-session-token --duration-seconds 3600
    This command generates a session token that is valid for an hour. This session token includes an AWS Access Key ID, an AWS secret access key, and an AWS session token. Note that this set of keys in this session token is different from those in the long-term access token.

    Tip

    Instead of using the AWS CLI, you can use AWS CloudShell. For details about CloudShell, see Getting started with AWS CloudShell. For more details about session tokens, see Request temporary security credentials

  6. Once you have your session token, specify your AWS access key ID, secret access key, region of the bucket, and session token as system environment variables in your MATLAB command window using the setenv (MATLAB) command.

    setenv("AWS_ACCESS_KEY_ID","YOUR_AWS_ACCESS_KEY_ID")
    setenv("AWS_SECRET_ACCESS_KEY","YOUR_AWS_SECRET_ACCESS_KEY")
    setenv("AWS_DEFAULT_REGION","YOUR_AWS_DEFAULT_REGION")
    setenv("AWS_SESSION_TOKEN","YOUR_AWS_SESSION_TOKEN")

    To increase the security of your code and make your code safer to share, you can store your credentials in your MATLAB vault as secrets and then reference them in your code. For more information, see Keep Sensitive Information Out of Code (MATLAB).

  7. If you are using MATLAB Parallel Server on Cloud Center, configure your cloud cluster to access S3 services.

    After you create a cloud cluster, configure your cluster profile with your AWS credentials. In your MATLAB session, in the Environment section on the MATLAB Home tab, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, select your cloud cluster profile. Scroll to the EnvironmentVariables property and add these environment variable names: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION, and AWS_SESSION_TOKEN. For more details, see Set Environment Variables on Workers (Parallel Computing Toolbox).

Verify Access to AWS Credentials from your MATLAB Session

Download and install the AWS Command Line Interface tool. In your MATLAB session, check if you already have access to an AWS account.

!aws sts get-caller-identity

If you have access to an AWS account in your MATLAB session, this command returns your AWS account number and other details related to your AWS account.

Upload Data to Amazon S3 from Local Machine

This section shows you how to upload data sets from your local machine to your Amazon S3 bucket. Later sections show you how to work with remote image and text data. To follow along with the examples in these sections, you can download some MathWorks® data sets on your local machine. Follow these steps to get started.

  • The Example Food Images data set contains 978 photographs of food in nine classes. You can download this data set to your local machine using this command in MATLAB.

    fprintf("Downloading Example Food Image data set ... ")
    filename = matlab.internal.examples.downloadSupportFile('nnet', 'data/ExampleFoodImageDataset.zip');
    fprintf("Done.\n")
    
    unzip(filename,"MyLocalFolder/FoodImageDataset");

  • To obtain the Traffic Signal Work Orders data set on your local machine, use this command.

    fprintf("Downloading Traffic Signal Work Orders data set ... ")
    zipFile = matlab.internal.examples.downloadSupportFile("textanalytics","data/Traffic_Signal_Work_Orders.zip");
    fprintf("Done.\n")
    
    unzip(zipFile,"MyLocalFolder/TrafficDataset");

You can upload data to Amazon S3 by using the AWS S3 web page. For more efficient file transfers to and from Amazon S3, use the AWS Command Line Interface tool.

To upload data set from your local machine to your Amazon S3 bucket, follow these steps.

  1. Create a bucket for your data using the following command in your MATLAB command window. Replace MyCloudData with the name of your Amazon S3 bucket.

    !aws s3 mb s3://MyCloudData

  2. Upload your data using the following command in your MATLAB command window.

    !aws s3 cp mylocaldatapath s3://MyCloudData --recursive

    For example, to upload the Example Food Images data set from your local machine to your Amazon S3 bucket, use this command.

    !aws s3 cp MyLocalFolder/FoodImageDataset s3://MyCloudData/FoodImageDataset/ --recursive

    To upload the Traffic Signal Work Orders data set from your local machine to your Amazon S3 bucket, use this command.

    !aws s3 cp MyLocalFolder/TrafficDataset s3://MyCloudData/TrafficDataset/ --recursive

Access Data from Amazon S3 in MATLAB

After you store your data in Amazon S3, you can use Data Import and Export (MATLAB) functions in MATLAB to read or write data from the Amazon S3 bucket in MATLAB. MATLAB functions that support a remote location in their filename input arguments allow access to remote data. To check if a specific function allows remote access, refer to its function page.

Note

If you’re on a MATLAB session in MATLAB Parallel Server in Cloud Center, save the images to the /shared/persisted folder on your headnode to ensure that all worker nodes across the cluster can access the folder. This location is optimized because each worker does not have to download the data individually.

For example, you can use imread (MATLAB) to read images from an Amazon S3 bucket. Replace s3://MyCloudData with the URL of your Amazon S3 bucket.

  1. Read an image from Amazon S3 using the imread (MATLAB) function.

    img = imread("s3://MyCloudData/FoodImageDataset/french_fries/french_fries_90.jpg");

  2. Display the image using the imshow (MATLAB) function.

    imshow(img)

To write data into the Amazon S3 bucket, you can similarly use Data Import and Export (MATLAB) functions which support write access to remote data. To check if a specific function allows remote access, refer to its function page.

Read Data from Amazon S3 in MATLAB Using Datastores

For large data sets in Amazon S3, you can use datastores to access the data from your MATLAB client or your cluster workers. A datastore is a repository for collections of data that are too large to fit in memory. Datastores allow you to read and process data stored in multiple files on a remote location as a single entity. For example, use an imageDatastore (MATLAB) to read images from an Amazon S3 bucket. Replace s3://MyCloudData with the URL of your Amazon S3 bucket.

  1. Create an imageDatastore object that points to the URL of the Amazon S3 bucket.

    imds = imageDatastore("s3://MyCloudData/FoodImageDataset/", ...
     IncludeSubfolders=true, ...
     LabelSource="foldernames");
  2. Read the first image from Amazon S3 using the readimage (MATLAB) function.

    img = readimage(imds,1);

  3. Display the image using the imshow (MATLAB) function.

    imshow(img)

To use datastores to read files or data of other formats, see Getting Started with Datastore (MATLAB).

For a step-by-step example that shows how to train a convolutional neural network using data stored in Amazon S3, see Train Network in the Cloud Using Automatic Parallel Support (Deep Learning Toolbox).

Write Data to Amazon S3 from MATLAB Using Datastores

You can use datastores to write data from MATLAB or cluster workers to Amazon S3. For example, follow these steps to use a tabularTextDatastore (MATLAB) object to read tabular data from Amazon S3 into a tall array, preprocess it, and then write it back to Amazon S3.

  1. Create a datastore object that points to the URL of the Amazon S3 bucket.

    ds = tabularTextDatastore("s3://MyCloudData/TrafficDataset/Traffic_Signal_Work_Orders.csv");
    
  2. Read the tabular data into a tall array and preprocess it by removing missing entries and sorting the data.

    tt = tall(ds);
    tt = sortrows(rmmissing(tt));

  3. Write the data back to Amazon S3 using the write (MATLAB) function.

    write("s3://MyCloudData/TrafficDataset/preprocessedData/",tt);
    

  4. To read your tall data back, use the datastore (MATLAB) function.

    ds = datastore("s3://MyCloudData/TrafficDataset/preprocessedData/");
    tt = tall(ds);
    

To use datastores to write files or data of other formats, see Getting Started with Datastore (MATLAB).

See Also

Topics