How to import image data and sample import formats.
Specifications
Supported file formats: JPG, PNG, and BMP
Import methods:
- IAM Delegated Access
- Signed URLs (
https
URLs only) - Direct upload of local files (256 MB max file size)
Note: Direct upload currently does not support adding additional metadata and attachments, see below Python example.
Image size limit
Please note that Labelbox does not recommend labeling images larger than 9000px by 9000px. Labeling images of this size may cause annotations to be irretrievable.
When importing cloud-hosted image data to Labelbox, your JSON file must include the following information for each image.
Parameter | Required | Description |
---|---|---|
row_data | Yes | https path to a cloud-hosted image. For IAM Delegated Access, this URL must be in virtual-hosted-style format. For older regions, your S3 bucket may be in https://<bucket-name>.s3.<region>.amazonaws.com/<key> format. If your object URLs are formatted this way, make sure they are in the virtual-hosted-style format before importing. |
global_key | No | Unique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if its global keys are duplicated to existing data rows. |
media_type | No | "IMAGE" (optional media type to provide better validation and error messaging) |
metadata_fields | No | See Metadata. |
attachments | No | See Attachments and Asset overlays. |
Import format
This import format can be used for both importing data assets via Uploading a file, and for importing data assets via Python SDK.
[
{
"row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-1.jpg",
"global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-1.jpg",
"media_type": "IMAGE",
"metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
"attachments": [{"type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg" }]
},
{
"row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-1.jpg",
"global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-2.jpg",
"media_type": "IMAGE",
"metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
[
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-1.jpg",
"global_key": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-1.jpg",
"media_type": "IMAGE",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg", "name": "RGB" }]
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-2.jpg",
"global_key": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-2.jpg",
"media_type": "IMAGE",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
Python example
from labelbox import Client
from uuid import uuid4 ## to generate unique IDs
import datetime
client = Client(api_key="<YOUR_API_KEY>")
metadata_ontology = client.get_data_row_metadata_ontology()
dataset = client.create_dataset(name="Single import example - Image")
asset = {
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": str(uuid4()),
"media_type": "IMAGE",
"metadata_fields": [{"name": "captureDateTime", "value": datetime.datetime.utcnow()},
{"name": "tag", "value": "tag_string"},
{"name": "split", "value": "train"}],
"attachments": [{"type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg"},
{"type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/cir.jpg"},
{"type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/weeds.jpg"},
{"type": "RAW_TEXT", "value": "IOWA, Zone 2232, June 2022 [Text string]"},
{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"},
{"type": "IMAGE", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/disease_attachment.jpeg"},
{"type": "VIDEO", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/drone_video.mp4"},
{"type": "HTML", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/windy.html"}],
}
dataset.create_data_row(**asset)
# equivalent method if using create_data_rows bulk endpoint
dataset.create_data_rows([asset])
from labelbox import Client
from uuid import uuid4 ## to generate unique IDs
import datetime
client = Client(api_key="<YOUR_API_KEY>")
dataset = client.create_dataset(name="Bulk import example - Image")
assets = [
{
"row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-1.jpg",
"global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-1.jpg" + str(uuid4()),
"media_type": "IMAGE",
"metadata_fields": [{"name": "tag", "value": "tag_string"}],
"attachments": [{"type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg", "name": "RGB" }],
},
{
"row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-1.jpg",
"global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/image-samples/sample-image-2.jpg-1"+ str(uuid4()),
"media_type": "IMAGE",
"metadata_fields": [{"name": "tag", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
task = dataset.create_data_rows(assets)
task.wait_till_done()
print(task.errors)
local_file_paths = ['path/to/local/file1', 'path/to/local/file1'] # limit: 15k files
new_dataset = client.create_dataset(name = "Local files upload")
try:
task = new_dataset.create_data_rows(local_file_paths)
task.wait_till_done()
except Exception as err:
print(f'Error while creating labelbox dataset - Error: {err}')