Integrations

Using IAM delegated access integrations is the recommended option for all cloud users. IAM delegated access is not supported in air-gap/on-premises environments.

27362736

Navigate to this view from the Workspace Settings menu option or go to the page https://app.labelbox.com/workspace-settings/integrations

IAM delegated access allows you to securely and seamlessly host your unlabeled data in your preferred cloud storage provider while providing Labelbox with the limited access necessary so you can view and label your data in Labelbox. You can store your assets in AWS S3, GCP, or Azure buckets and use native Identity and Access Management (IAM) roles and policies to control Labelbox access.

Supported cloud providers

Cloud providerSetup instructions
Amazon S3Guide
Google Cloud PlatformGuide
Microsoft AzureGuide

Supported data types

Below are all of the asset types that you can import to Labelbox via IAM Delegated Access. Click on the links to see an sample payload for importing your specific asset type.

Data typeSupported
ImagesSee import payload
VideoSee import payload
TextSee import payload
Tiled imagery (Slippy maps)NOT SUPPORTED
Tiled imagery (COG, NITF, GeoTIFF)See import payload
AudioSee import payload
DICOMSee import payload
DocumentSee import payload
ConversationSee import payload

You can also use IAM delegated access to host attachments to your data rows.

How IAM delegated access works

You will need to set up an IAM role in your cloud provider's account and grant Labelbox the ability to assume that role in order to perform certain tasks on your behalf. Then, you create a policy that defines exactly what that assumed role can do.

For example, Labelbox needs to be able to access assets stored in your AWS S3 bucket that you would like to label. So, you would grant the role the ability to perform GetObject requests on a particular S3 bucket (the setup instructions below will explain how to do this).

24442444

Delegated Access setup in AWS (similar in GCP)

IAM delegated access is highly flexible and allows you to control access at the granularity that you desire. You can grant Labelbox access to all of your buckets, a single bucket, or even a particular path within a bucket. You can even set up different integrations within Labelbox for different datasets or projects. IAM delegated access allows you to use private cloud-hosted buckets with Labelbox, which helps to ensure that your assets are kept safe.

When, why, and how Labelbox accesses your data

The Labelbox app will access assets for display in the user interface (e.g., during a labeling workflow, label review, when configuring a dataset, etc).

When the Labelbox app needs access to an asset, it will request an expiring, signed URL from the Labelbox API (the backend). The backend will assume the role that you configured in your AWS account and generate a signed URL for the asset being requested. The URL will then be returned to the frontend, which will use it to access the asset directly from your cloud storage:

24602460

Delegated Access client request flow

The Labelbox backend also needs access to assets in order to extract metadata (e.g., image and video dimensions, video codec and length, etc.), generate thumbnails, and more advanced tools.

The backend similarly assumes the role that you configured, generates a temporary signed URL, and then uses that URL to access the asset. For example, the backend will download an asset, process it (e.g., media attribute extraction), and then release/delete it. Wherever possible, the asset will only be downloaded to memory and all processing will occur in memory. However, there may be cases where processing requires that the asset be written to a non-memory file system temporarily. In these cases, the asset will be written to some other ephemeral storage (such as a container’s file system) and deleted when processing completes.

15921592

Asset processing in the Labelbox backend

📘

Data processing location

Currently, all asset processing is performed in US-based datacenters.

Managing and selecting Integrations

To create and manage your integrations, navigate to the Workspace settings and select the Integrations tab. Further details for creating integrations are located in the specific cloud provider subsections nested under this page.

Select an integration when creating a dataset in the UI

When creating a new dataset from the UI upload, you will have the option to select which integration is used for the newly created dataset. When following these steps, please ensure that you have selected the appropriate integration or the uploads will fail.

Update an integration in the UI

To change the integration linked to any dataset, regardless of how it was created, navigate to Catalog, select the desired dataset, and click the gear icon in the top right corner of the page. Here you can select the desired integration from the dropdown options.

📘

Note

When you create a new dataset through the UI that utilizes IAM delegated access, you will be prompted to select your integration before beginning the upload.

Select an integration in the SDK

When creating a dataset via the SDK, the create_dataset method has an optional iam_integration parameter that can be used to specify the desired integration. Sample code for viewing and selecting integrations, along with creating a dataset using this parameter, is shown in the Common SDK methods section below.

🚧

If no argument is provided to the optional iam_integration parameter in the create_dataset method, then the default integration will automatically be used.

Common SDK methods

#!pip install labelbox
import labelbox

organization = client.get_organization()
print(organization.get_iam_integrations())
print(organization.get_default_iam_integration())


iam_integration = organization.get_iam_integrations()[1]
dataset = client.create_dataset(name="IAM manual demo", iam_integration=iam_integration)
task1 = dataset.create_data_rows(datarows)
task1.wait_till_done()

Delegated Access FAQ

How long are the signed URLs generated by IAM delegated access valid?

Currently, the expiration time is set between 15 minutes to 24 hours depending on the selected cloud provider.

Can I invalidate all “active” signed URLs?

Yes, simply remove the “GetObject” permission from your AWS role’s permission policy and all active signed URLs will cease to have access to your S3 bucket.

Does Labelbox cache assets in any way?

Normally, Labelbox-hosted assets are served through a CDN to improve performance; however, Delegated Access assets are served directly from customer S3 buckets, so no CDN caching occurs.

To control browser caching, you can configure various cache-related headers on your S3 bucket. See this link for details.

Can I restrict access to my assets by IP?

Yes, you might want to restrict access to your assets by only allowing requests from certain IP ranges. However, make sure to also grant the Labelbox backend (35.223.142.181) access to your assets.

Does IAM delegated access affect how annotations are stored and accessed?

No, label annotations (e.g., image segmentation masks) are still stored in Labelbox-hosted storage, even when the underlying data row is customer-hosted.

Can I use IAM delegated access to import data to a custom editor project?

Yes.

Can I set up multiple IAM delegated access integrations?

It is possible to create multiple integrations. However, it is recommended to create only a single integration and make it the default. You can control what buckets to provide access to by configuring the IAM policy within the respective cloud provider.

Every time you create a new dataset (via app or SDK), the Labelbox application will use the default IAM delegated access integration to access the content. This offers the best "set and forget" experience for customers who use Labelbox regularly.


What’s Next