How signed URLs work
- You: Generate URLs — You write and run a script on your own infrastructure that generates a unique signed URL for each data asset you want to label. You must set an expiration date for these URLs.
- You: Format and upload — You create a JSON file that maps each data asset to its corresponding signed URL. You then import this JSON file when creating a new dataset in Labelbox.
- Labelbox: Renders data — When a user opens a labeling task, Labelbox retrieves the corresponding signed URL from your imported JSON file. The Labelbox application uses that URL to fetch the data directly from your bucket and render it in the editor.
- Your cloud provider: Validates — Your cloud provider inspects the signature on the URL. If the signature is valid and the URL has not expired, it serves the file to the Labelbox application. If not, it returns an error.
Key advantages
- Fast to start: This method is the quickest way to begin a project, as it bypasses the need for initial IAM configuration in your cloud environment. It’s excellent for short-term projects, pilots, or proof-of-concepts.
- Granular, explicit control: You have explicit, file-level control over exactly which assets are accessible and for precisely how long.
Step-by-step instructions
Here is a step-by-step guide to using signed URLs with Labelbox.Prerequisites
Before you start, make sure you have the following:- Your data is stored in a cloud storage bucket (AWS S3, Google Cloud Storage, or Azure Storage).
- You have the necessary permissions in your cloud environment to generate signed URLs.
- You have a local environment set up to run scripts (e.g., Python, Node.js) for generating the signed URLs.
Step 1: Generate your signed URLs
For each file you want to import into Labelbox, you need to generate a signed URL. This is typically done by writing a script that uses your cloud provider’s SDK. Refer to your cloud provider’s official documentation for instructions and code samples:- AWS S3: Upload objects with presigned URLs
- Google Cloud Storage: Create a GET-signed URL for an object using Cloud Storage libraries
- Microsoft Azure Storage: Create a user delegation SAS
Step 2: Create your JSON file
Once you have generated your signed URLs, you will need to create a JSON file that you will upload to Labelbox. This file maps each of your data assets to its corresponding signed URL. For details on the JSON format for different data types, please refer to the following documentation:Images
Text
Geospatial
Conversational text
Videos
Audio
Documents
HTML
Step 3: Import your JSON file into Labelbox
With your JSON file created, you can now import it into Labelbox to create your dataset.- Go to Catalog.
- Navigate to the Import data section.
- Select the option to upload a JSON file.
- Upload the JSON file you created in the previous step.
- Labelbox will process the file and create a new dataset with your data.
Security best practices
- Use Short-Lived Expiration Times: Set the expiration time for your signed URLs to be as short as possible. This minimizes the risk of unauthorized access if a URL is accidentally exposed. For most labeling projects, an expiration of 7 days is sufficient.
- Principle of Least Privilege: Only generate signed URLs for the specific files that need to be labeled. Avoid generating URLs for entire buckets or directories.
- Audit Regularly: Keep logs of the signed URLs you generate and review them periodically to ensure there is no unusual activity.