Amazon S3

Learn how to import your S3 bucket data to Labelbox via IAM delegated access.

When you use IAM delegated access to add your unlabeled data to Labelbox, you can keep your assets in AWS and configure Identity and Access Management (IAM) roles and policies to grant Labelbox read-only access to your S3 buckets.

Part 1: Create a new integration in Labelbox

First, you need to create a new integration in the Labelbox UI.

  1. Navigate to Settings > Integrations.

  2. Under Add integrations, select Sync from a source.

  3. Select AWS.

  4. Copy the Labelbox AWS account ID and External ID.

  5. Leave this Create AWS integration page open as you will come back to it later.

Part 2: Create a role for Labelbox in AWS

Next, you need to create a role for Labelbox in your AWS account, specify permissions, and select a bucket.

  1. Go to your AWS account and set up CORS for your bucket. CORS allows Labelbox to request resources from your cloud storage. See Create CORS headers to learn how to set up CORS for your bucket.

  2. In your AWS account, create a permission policy for your bucket. If you already have a permission policy you plan to use, proceed to step 7. If you don't have a permission policy yet:

    1. Navigate to the IAM Management Console > Policies page
    2. Click Create policy
    3. Select JSON policy editor and enter your policy. The following example policy restricts access to a specific S3 bucket.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:ListBucket",
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:s3:::CustomerBucketARN/*",
                    "arn:aws:s3:::CustomerBucketARN"
                ]
            }
        ]
    }
    
    ElementDescription
    EffectSpecifies that the elements included in the statement are allowed.
    ActionDescribes the specific action(s) that will be allowed. Setting this to s3:GetObject gives Labelbox read-only access to the bucket you specify. See IAM JSON policy elements: Action to learn more.
    ResourceSpecifies the object(s) that the statement covers. This is where you specify your Bucket ARN. To find your Bucket ARN, go to your s3 console, select the bucket from the list, go to the Properties tab, and copy the Amazon Resource Name (ARN). The * at the end of the example ARN above is a wildcard character. See IAM JSON policy elements: Resource to learn more.
  3. Click Next to move to the Review and create step.

  4. Add a name and an optional description for the added policy. Use a meaningful name to identify the policy, such as LabelboxReadAccess.

  5. When you finish reviewing, click Create policy.

  6. From the Roles page:

    a. Click Create role.

    b. Select Custom trust policy and enter the following policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::340636424752:role/lb-aws-delegated-access-role"
          },
          "Action": "sts:AssumeRole",
          "Condition": {
            "StringEquals": {
              "sts:ExternalId": "" // Enter the external ID obtained from Part 1
            }
          }
        }
      ]
    }
    

    c. Click Next to move to the Add Permissions step.

  7. In the Permissions policies section, check the box next to the permission policy you created to attach it to your role or select a policy in the list provided, such as AmazonS3ReadOnlyAccess.

  8. Click Next to move to the Name, review, and create step.

  9. Add a name and an optional description for the added policy. Use a meaningful name to identify the policy, such as LabelboxS3Access.

  10. When you finish reviewing, click Create role.

  11. Click on the role you just created and copy the Role ARN on the Summary tab.

Part 3: Complete integration setup in Labelbox

Add the Role ARN to the new integration you added in Labelbox in Part 1.

Go back to the Create AWS integration page in Labelbox complete Provider ARN and name section:

  • Set the integration name
  • Enter the AWS bucket name
  • Paste the AWS Role ARN

Part 4: Validate the integration

Next, you need to make sure the validation was set up correctly.

After you completed Parts 1 & 3 in Labelbox, click Save integration. The system then automatically runs a validation check on the integration setup for you. You can check the status on Integrations > Manage integrations. The Last checked column indicates whether the integration was successful. If the integration fails, click on the refresh icon to view the error messages.

Here are possible error messages and our suggestions for troubleshooting your integration setup.

ErrorTroubleshooting
Role cannot be assumed Ensure that the integration’s role ARN is correct and that the Labelbox External ID is properly configured in your AWS account.

Additionally, your AWS account admin must activate STS in the us-east-2 region using the IAM console
External ID configured insecurely Ensure that the Labelbox External ID is properly configured in your AWS account.

Part 5: Create & upload the dataset

Delegated Access for AWS supports virtual-hosted-style URLs; they follow this format:

https://<bucket-name>.s3.<region>.amazonaws.com/<key>

Click through the links in this table to find the import format for your data type.

Data typeSupported
ImagesImport specifications
VideoImport specifications
TextImport specifications
Tiled imagery (Slippy maps)NOT SUPPORTED
Tiled imagery (COG, NITF, GeoTIFF)Import specifications
AudioImport specifications
DocumentImport specifications
ConversationImport specifications

Then, follow the instructions in Create a dataset.

Part 6: Validate the dataset

Last, you need to make sure your dataset was configured correctly.

If you created your integration and imported your dataset using the Labelbox UI, Labelbox automatically runs validation checks to determine whether the CORS setup was configured properly. It also checks whether Labelbox can successfully fetch data from your S3 bucket and if Labelbox can properly sign the URLs.

Your dataset should now be set up with IAM delegated access. Labelbox uses the AWS role you created to generate temporary signed URLs every time it accesses data in your S3 bucket.