Amazon S3
Learn how to import your S3 bucket data to Labelbox via IAM delegated access.
When you use IAM delegated access to add your unlabeled data to Labelbox, you can keep your assets in AWS and configure Identity and Access Management (IAM) roles and policies to grant Labelbox read-only access to your S3 buckets.
Follow the steps below to set up IAM delegated access for your S3 buckets.
Part 1: Open new integration in Labelbox
First, you will need to open a new integration in the Labelbox UI.
-
Sign in to Labelbox.
-
From the main menu, choose Workspace settings and then select the Integrations tab.
-
Select the Amazon Web Services button to create a new integration.
-
Copy the Labelbox account ID and external ID.
-
Leave this open as you will come back to it later.
ARN placeholder
If you need time to obtain an
ARN
and you want to reserve theexternal ID
, then use the following temporary value:arn:aws:iam::000000000000:role/temporary
Part 2: Create a role for Labelbox in AWS
Next, you'll need to create a role for Labelbox in your AWS account, specify permissions, and select a bucket. Follow the steps below to set this up in your AWS account.
-
Go to your AWS account and set up CORS for your bucket (CORS allows Labelbox to request resources from your cloud storage). See Create CORS headers to learn how to set up CORS for your bucket.
-
In your AWS account, create a permission policy for your bucket. If you already have a permission policy you plan to use, proceed to step 7. In your IAM Management Console, go to the Policies section, click Create policy, and enter your policy in the JSON tab. This sample policy restricts access to a specific S3 bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::CustomerBucketARN/*",
"arn:aws:s3:::CustomerBucketARN"
]
}
]
}
Element | Description |
---|---|
Effect | Specifies that the elements included in the statement are allowed. |
Action | Describes the specific action(s) that will be allowed. Setting this to s3:GetObject gives Labelbox read-only access to the bucket you specify. See IAM JSON policy elements: Action to learn more. |
Resource | Specifies the object(s) that the statement covers. This is where you specify your Bucket ARN. To find your Bucket ARN, go to your s3 console, select the bucket from the list, go to the Properties tab, and copy the Amazon Resource Name (ARN). The * at the end of the example ARN above is a wildcard character. See IAM JSON policy elements: Resource to learn more. |
-
Click Next: Review to bypass the optional Add tags step. Tags are not required to set up this integration.
-
In the Review policy step, name the policy you just created. We recommend naming it something like LabelboxReadAccess.
-
To approve, click Create policy.
-
From the Roles page, follow these steps:
a. Click Create role.
b. Select AWS account followed by the radio button for Another AWS account.
c. Paste the Labelbox Account ID from step 1.
d. Check the box for Require external ID.
e. Paste the Labelbox External ID from step 1.
f. Do not check the box for Require MFA.
g. Click Next: Permissions.
-
In the Attach permissions policies section, check the box next to the permission policy you created to attach it to your role. Or you can select a policy in the list provided (e.g.,
AmazonS3ReadOnlyAccess
). -
Click Next: Tags.
-
Click Next: Review to bypass the optional Add tags step. Tags are not required to set up this integration.
-
Name the role you created for Labelbox. We recommend naming it something like
LabelboxS3Access
. -
When you are done reviewing, click Create role.
-
Click on the role you just created and copy the Role ARN at the top of the Summary tab.
Part 3: Complete integration setup in Labelbox
Add the Role ARN to the new integration you opened in Labelbox in Part 1.
Go back to the Integrations tab in Labelbox and:
- Set the integration name
- Enter the bucket name
- Paste the AWS Role ARN
Part 4: Validate the integration
Next, you'll need to make sure the validation was set up correctly.
If you completed Parts 1 & 3 via the Labelbox UI, Labelbox will automatically run a validation check on the integration setup for you. You can check by going to the Integrations tab and checking the Last checked column indicates whether the integration was successful. If the integration failed, click on the refresh icon to view the error messages.
Here are the possible error messages and our suggestions for troubleshooting your integration setup.
Error | Troubleshooting |
---|---|
Role cannot be assumed | Ensure that the integration’s role ARN is correct and that the Labelbox External ID is properly configured in your AWS account. Additionally, your AWS account admin must activate STS in the us-east-2 region using the IAM console |
External ID configured insecurely | Ensure that the Labelbox External ID is properly configured in your AWS account. |
Part 5: Create & upload the dataset
Delegated Access for AWS supports “virtual-hosted-style” URLs; they follow this format:
https://<bucket-name>.s3.<region>.amazonaws.com/<key>
Click through the links in this table to find the import format for your data type.
Data type | Supported |
---|---|
Images | Import specifications |
Video | Import specifications |
Text | Import specifications |
Tiled imagery (Slippy maps) | NOT SUPPORTED |
Tiled imagery (COG, NITF, GeoTIFF) | Import specifications |
Audio | Import specifications |
Document | Import specifications |
Conversation | Import specifications |
Then, follow the instructions in Create a dataset.
Part 6: Validate the dataset
Last, you will need to make sure your dataset was configured correctly.
If you created your integration and imported your dataset via the Labelbox UI, Labelbox will automatically run validation checks to determine whether the CORS setup was configured properly. It will also check whether Labelbox can successfully fetch data from your S3 bucket and if Labelbox can properly sign the URLs.
Your dataset should now be set up with IAM delegated access. Labelbox will use the AWS role you created to generate temporary signed URLs every time it accesses data in your S3 bucket.
Updated 22 days ago