Skip to main content

Overview

To start labeling your data, you first need to grant our platform secure access to the files stored in your private cloud (AWS, GCP, or Azure). This guide explains the two methods for connecting your data, helping you choose the best one for your project’s security and workflow needs. The two methods are
  • IAM Delegated Access: A robust, long-term connection method.
  • Signed URLs: A flexible method using temporary, secure links to your data.
Our Recommendation: For most use cases, especially long-term projects, we recommend IAM Delegated Access for its superior security and lower maintenance.
FeatureIAM Delegated AccessSigned URLs
Setup complexityHigh. Requires a one-time configuration within your cloud provider’s IAM console.Low. No initial cloud configuration is needed in Labelbox.
MaintenanceLow. “Set it and forget it.” Works for all data in the configured location.High. Requires a continuously running service on your end to generate new URLs.
Data FreshnessReal-time. New data added to your bucket is immediately available for labeling.Delayed. New data requires new signed URLs to be generated and uploaded to Labelbox.
Ideal forLong-term projects, enterprise-scale data operations, and stringent security environments.Quick-start projects, proof-of-concepts, or when you cannot create IAM roles.

IAM delegated access

IAM (Identity and Access Management) Delegated Access is the most secure and scalable method for connecting your data. You create a trust relationship by setting up a dedicated role within your own cloud account that Labelbox is permitted to assume. This gives Labelbox temporary, read-only credentials to access your data when your users are labeling.

How it works

  1. You: Create an IAM role in your AWS, GCP, or Azure account that has read-only permissions to your data bucket.
  2. You: Provide Labelbox with the unique identifier (ARN/ID) of that role.
  3. Labelbox: When a user needs to view an image or document, Labelbox uses the provided identifier to request temporary access credentials from your cloud provider.
  4. Your Cloud Provider: Validates the request and grants Labelbox a short-lived token to access only the specified data.

Key advantages

  • Superior Security: Your secret keys are never shared with Labelbox. Access is easily auditable and can be revoked at any time from your cloud console.
  • Low Maintenance: After the initial setup, you never have to worry about managing credentials or access again. It just works.
  • Simplified Workflow: Data scientists and annotators can easily browse and import data without needing to handle URLs.

Step-by-step guides

Connect to AWS S3

Connect to Google Cloud Storage

Connect to Azure Blob Storage


Signed URLs

A signed URL is a web link that provides temporary access to a specific file in your storage bucket. Each URL is “signed” with cryptographic keys that validate the request and expire after a set time (e.g., 7 days). You are responsible for generating these URLs and providing them to Labelbox.

How it works

  1. You: Write and run a script or service that generates a unique signed URL for each data asset you want to label.
  2. You: Create a JSON file containing these URLs and upload it to Labelbox.
  3. Labelbox: When a user accesses a task, Labelbox uses the corresponding signed URL from your JSON file to fetch and display the data.
  4. Your Cloud Provider: Validates the signature on the URL and serves the file. Access is denied if the URL has expired.

Key advantages

  • Fast to Start: Bypasses the need for complex IAM configuration, making it ideal for quick tests or proof-of-concepts.
  • Granular Control: You have explicit, file-level control over what data is accessible and for how long.

Step-by-step guide

Generate signed URLs