Create a dataset
Instructions for uploading a dataset to Labelbox via the application.
Import specifications: image | video | text | geospatial | audio | documents | HTML | DICOM
Python tutorials: datasets | data rows | data row metadata
In Labelbox, a data row represents an asset and all of its relevant information. A dataset is a collection of data rows imported to Labelbox.
Key definitions
Term | Definition |
---|---|
Data row | Contains all of the following information for a single asset: - URL to your cloud-hosted file - Metadata - Media attributes (e.g., data type, size, etc.) - Attachments (files that provide context for your labelers) |
Dataset | A set of data rows from a single domain or source |
Asset | A single cloud-hosted file to be labeled (e.g., an image, a video, or a text file). |
Attachment | Supplementary information you can attach to an asset that provides contextual information used as an aid during labeling. Learn more about attachments and image layers. |
Global key | A customer-specified ID for each data row asset. It is an optional field, but it is a good practice to use global keys to map your external database/file path to your Labelbox assets for easy retrieval. Global keys are uniquely enforced at the Catalog (organization) level, so it helps prevent duplicate data upload. This is the preferred ID to use to identify all your assets. |
Supported data types
Name | Kinds | Import specs |
---|---|---|
Images | PNG, JPEG, BMP | Image import format |
Video | MP4 | Video import format |
Text | TXT (UTF-8) | Text import format |
Geospatial imagery | Tile Map Server | Geospatial import format |
Simple tiled | Tile Map Server | Simple tiled import format |
Audio | MP3, WAV, M4A | Audio import format |
Documents | Documents import format | |
HTML | HTML | HTML import format |
DICOM | DCM | DICOM import format |
Create a dataset via the app UI
Although uploading a dataset via the Python SDK is recommended, you can still upload datasets via to Labelbox via the app UI.
Step 1: Create your JSON file
-
Create a JSON file containing data formatted as per data type.
-
Go to the Create a dataset page.
-
Drag and drop your JSON file onto the page.
When you upload your JSON file, your data is able to remain in your cloud bucket. See Integrations to learn how to set up IAM delegated access.
Give it a try using the examples below. Copy and paste the content into a text editor and save it as a JSON file (.json extension)
[
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-1.jpg",
"global_key": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-1.jpg",
"media_type": "IMAGE",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg", "name": "RGB" }]
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-2.jpg",
"global_key": "https://storage.googleapis.com/labelbox-datasets/image_sample_data/image-sample-2.jpg",
"media_type": "IMAGE",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
[
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/video-sample-data/sample-video-1.mp4",
"global_key": "https://storage.googleapis.com/labelbox-datasets/video-sample-data/sample-video-1.mp4",
"media_type": "VIDEO",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "VIDEO", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/drone_video.mp4" }]
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/video-sample-data/sample-video-2.mp4",
"global_key": "https://storage.googleapis.com/labelbox-datasets/video-sample-data/sample-video-2.mp4",
"media_type": "VIDEO",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
[
{
"row_data": {
"pdf_url": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf",
"text_layer_url": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483-lb-textlayer.json"
},
"global_key": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf",
"media_type": "PDF",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "HTML", "value": "https://www.wikipedia.org/" }]
}
]
[
{
"row_data":{
"tile_layer_url": "https://s3-us-west-1.amazonaws.com/lb-tiler-layers/mexico_city/{z}/{x}/{y}.png",
"bounds": [
[
19.405662413477728,
-99.21052827588443
],
[
19.400498983095076,
-99.20534818927473
]
],
"min_zoom": 12,
"max_zoom": 20,
"epsg": "EPSG4326",
"alternative_layers": [
{
"tile_layer_url": "https://api.mapbox.com/styles/v1/mapbox/satellite-streets-v11/tiles/{z}/{x}/{y}?access_token=pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4NXVycTA2emYycXBndHRqcmZ3N3gifQ.rJcFIG214AriISLbB6B5aw",
"name": "Satellite"
},
{
"tile_layer_url": "https://api.mapbox.com/styles/v1/mapbox/navigation-guidance-night-v4/tiles/{z}/{x}/{y}?access_token=pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4NXVycTA2emYycXBndHRqcmZ3N3gifQ.rJcFIG214AriISLbB6B5aw",
"name": "Guidance"
}
]
},
"global_key": "https://s3-us-west-1.amazonaws.com/lb-tiler-layers/mexico_city/{z}/{x}/{y}.png",
"media_type": "TMS_GEO",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
Step 2: Upload your JSON file
-
Log into Labelbox.
-
Go to the Catalog and select New dataset.
-
Upload any supported data types.
Limits
See this page to learn about the limits for uploading data to Labelbox.
Best practices
It is best to put data from a single domain or source into a single dataset. Organizing your data this way will make it easier to set up your labeling workflows. For example, it would be easiest to organize a set of images coming from a particular type of medical device into a single dataset. You can then use metadata to better organize and filter the Data Rows within that dataset.
Clear names that explain the source and purpose of a dataset are best. For example medical-device-type-1
would help identify this dataset as data relating to a particular version of a device. You can use the dataset description to include more context.
Append to an existing dataset
You can also append data rows to a dataset in the UI. Go to Catalog, select your dataset from the left, then click Append to dataset.

For instructions on how to append to a dataset using the Python SDK, see Dataset.
Copy the dataset ID
Each dataset has a unique dataset ID. You can find this dataset ID in the UI of Labelbox:
- Go to Catalog
- Select your dataset
- Copy the ID from the URL
Updated about 2 months ago