IAM Delegated Access

Delegated Access allows you to securely and seamlessly host your labeling assets while providing the Labelbox application with the limited access it needs. You can store your assets in an AWS S3 or GCP bucket and use native Identity and Access Management (IAM) roles and policies to control access.

📘

Note

Delegated access is not currently supported with Tiled imagery editor.

Supported cloud providers

Cloud provider

Status

Setup instructions

Amazon S3

Active

Using Amazon S3

Google Cloud Platform

Active

Using Google Cloud Storage (GCS)

Microsoft Azure

Beta

Microsoft Azure Blob Storage

How it works

At a high level, you set up an IAM role in your AWS account and grant Labelbox the ability to assume that role in order to perform certain tasks on your behalf. Then, you create a policy that defines exactly what that assumed role can do. For example, Labelbox needs to be able to access assets stored in your S3 bucket that you would like to label. So, you would grant the role the ability to perform GetObject requests on a particular S3 bucket (the setup instructions below will explain how this is done).

Delegated Access setup in AWS (similar in GCP)Delegated Access setup in AWS (similar in GCP)

Delegated Access setup in AWS (similar in GCP)

This setup is highly flexible and allows you to control access at the granularity that you desire. You can grant Labelbox access to all of your buckets, a single bucket or even a particular path within a bucket. You can even set up different integrations within Labelbox for different datasets or projects. Delegated Access allows you to use private S3 buckets with Labelbox, which helps to ensure that your assets are kept safe.

Supported cloud content

You can add the following content to a Data Row via IAM Delegated Access.

Data Row content

Supported

Asset type

Yes

Attachments

Yes

When, why, and how Labelbox accesses your data

The Labelbox frontend will access assets for display in the user interface. For example, during a labeling workflow, label review, when configuring a dataset, etc.

When the Labelbox frontend needs access to an asset, it will request an expiring, signed URL from the Labelbox API (the backend). The backend will assume the role that you configured in your AWS account and generate a signed URL for the asset being requested. The URL will then be returned to the frontend, which will use it to access the asset directly from your S3 storage:

Delegated Access client request flowDelegated Access client request flow

Delegated Access client request flow

The Labelbox backend also needs access to assets in order to extract metadata (e.g., image and video dimensions, video codec and length, etc.), generate thumbnails and support advanced image segmentation such as Superpixel.

The backend similarly assumes the AWS role that you configured, generates a temporary, signed URL, and then uses that URL to access the asset. For example, the backend will download an asset, process it some way (e.g., metadata extraction or Superpixel detection), and then release / delete it. Wherever possible, the asset will only be downloaded to memory and all processing will occur in-memory. However, there may be cases where processing requires that the asset be written to a non-memory file system temporarily. In these cases, the asset will be written to some other ephemeral storage (such as a container’s file system), and deleted when processing completes.

Asset processing in the Labelbox backendAsset processing in the Labelbox backend

Asset processing in the Labelbox backend

📘

Data processing location

Currently, all asset processing is performed in US-based datacenters.

Delegated Access FAQ

How long are the signed URLs generated by Delegated Access valid for?

Currently, the expiration time is set between 15 minutes to 24 hours depending on the selected cloud provider.

Can I invalidate all “active” signed URLs?

Yes, simply remove the “GetObject” permission from your AWS role’s permission policy and all active signed URLs will cease to have access to your S3 bucket.

Does Labelbox cache assets in any way?

Normally, Labelbox-hosted assets are served through a CDN to improve performance; however, Delegated Access assets are served directly from customer S3 buckets, so no CDN caching occurs.

To control browser caching, you can configure various cache-related headers on your S3 bucket. See this link for details.

Can I restrict access to my assets by IP?

Yes, you might want to restrict access to your assets by only allowing requests from certain IP ranges. However, make sure to also grant the Labelbox backend (35.223.142.181) access to your assets.

Does Delegated Access affect how label annotations are stored and accessed?

No, label annotations (e.g., image segmentation masks) are still stored in Labelbox-hosted storage, even when the underlying data row is customer-hosted.

Can I use Delegated Access to import data to a Custom Editor project?

Yes.

Can I set up multiple IAM Delegated Access integrations?

It is possible to create multiple integrations. However, it is recommended to create only a single integration and make it the default. You can control what buckets to provide access to by configuring the IAM policy within the respective cloud provider.

Every time a user creates a new dataset (via app or SDK), the Labelbox application will use the default IAM Delegated Access integration to access the content. This offers the best "set and forget" experience to the ML engineers who will be using Labelbox regularly.


What’s Next
Did this page help you?