Developer guide for importing annotations on image data and sample import formats.
Overview
To import annotations in Labelbox, you need to create an annotations payload. In this section, we provide this payload for every supported annotation type.
Annotation payload types
Labelbox supports two formats for the annotations payload:
- Python annotation types (recommended)
- Provides a seamless transition between third-party platforms, machine learning pipelines, and Labelbox.
- Allows you to build annotations locally with local file paths, numpy arrays, or URLs
- Easily convert Python Annotation Type format to NDJSON format to quickly import annotations to Labelbox
- Supports one-level nested classification (radio, checklist, or free-form text) under a tool or classification annotation.
- JSON
- Skips formatting annotation payload in the Labelbox Python annotation type
- Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation.
Label import types
Labelbox additionally supports two types of label imports:
- Model-assisted labeling (MAL)
- This workflow allows you to import computer-generated predictions (or simply annotations created outside of Labelbox) as pre-labels on an asset.
- Ground truth
- This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox Annotate. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth.
Supported annotations
The following annotations are supported for an image data row:
- Radio
- Checklist
- Free-form text
- Bounding box
- Point
- Polyline
- Polygon
- Segmentation masks
- Relationships
Classifications
Radio (single choice)
radio_annotation = lb_types.ClassificationAnnotation(
name="radio_question",
value=lb_types.Radio(answer =
lb_types.ClassificationAnswer(name = "first_radio_answer")
)
)
radio_annotation_ndjson = {
"name": "radio_question",
"answer": {"name": "first_radio_answer"}
}
Checklist (multiple choice)
checklist_annotation = lb_types.ClassificationAnnotation(
name="checklist_question",
value=lb_types.Checklist(answer = [
lb_types.ClassificationAnswer(name = "first_checklist_answer"),
lb_types.ClassificationAnswer(name = "second_checklist_answer"),
lb_types.ClassificationAnswer(name = "third_checklist_answer")
])
)
checklist_annotation_ndjson = {
"name": "checklist_question",
"answer": [
{"name": "first_checklist_answer"},
{"name": "second_checklist_answer"},
{"name": "third_checklist_answer"},
]
}
Free-form text
text_annotation = lb_types.ClassificationAnnotation(
name = "free_text",
value = lb_types.Text(answer="sample text")
)
text_annotation_ndjson = {
"name": "free_text",
"answer": "sample text",
}
Nested classifications
nested_radio_annotation = lb_types.ClassificationAnnotation(
name="nested_radio_question",
value=lb_types.Radio(
answer=lb_types.ClassificationAnswer(
name="first_radio_answer",
classifications=[
lb_types.ClassificationAnnotation(
name="sub_radio_question",
value=lb_types.Radio(
answer=lb_types.ClassificationAnswer(
name="first_sub_radio_answer"
)
)
)
]
)
)
)
nested_checklist_annotation = lb_types.ClassificationAnnotation(
name="nested_checklist_question",
value=lb_types.Checklist(
answer=[lb_types.ClassificationAnswer(
name="first_checklist_answer",
classifications=[
lb_types.ClassificationAnnotation(
name="sub_checklist_question",
value=lb_types.Checklist(
answer=[lb_types.ClassificationAnswer(
name="first_sub_checklist_answer"
)]
))
]
)]
)
)
nested_radio_annotation_ndjson= {
'name': 'nested_radio_question',
'answer': {
'name': 'first_radio_answer',
'classifications': [
{
'name':'sub_radio_question',
'answer': { 'name' : 'first_sub_radio_answer'}
}
]
}
}
nested_checklist_annotation_ndjson = {
"name": "nested_checklist_question",
"answer": [{
"name": "first_checklist_answer",
"classifications" : [
{
"name": "sub_checklist_question",
"answer": {"name": "first_sub_checklist_answer"}
}
]
}]
}
Tools
Bounding Box
bbox_annotation = lb_types.ObjectAnnotation(
name="bounding_box", # must match your ontology feature's name
value=lb_types.Rectangle(
start=lb_types.Point(x=1690, y=977), # x = left, y = top
end=lb_types.Point(x=1915, y=1307), # x= left + width , y = top + height
))
bbox_annotation_ndjson = {
"name": "bounding_box",
"bbox": {
"top": 977,
"left": 1690,
"height": 330,
"width": 225
}
}
Segmentation mask
The following example shows how to import composite masks. It can be can also be adapted to import single-instance masks.
Mask imports are limited to 9,000 x 9,000 pixels. Larger masks will not be imported.
# First we need to extract all the unique colors from the composite mask
def extract_rgb_colors_from_url(image_url):
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
colors = set()
for x in range(img.width):
for y in range(img.height):
pixel = img.getpixel((x, y))
if pixel[:3] != (0,0,0):
colors.add(pixel[:3]) # Get only the RGB values
return colors
cp_mask_url = "https://storage.googleapis.com/labelbox-datasets/image_sample_data/composite_mask.png"
colors = extract_rgb_colors_from_url(cp_mask_url)
response = requests.get(cp_mask_url)
mask_data = lb.types.MaskData(im_bytes=response.content) # You can also use "url" instead of im_bytes to pass the PNG mask url.
# These are the colors that will be associated with the mask_with_text_subclass tool
rgb_colors_for_mask_with_text_subclass_tool = [(73, 39, 85), (111, 87, 176), (23, 169, 254)]
cp_mask = []
for color in colors:
# We are assigning the color related to the mask_with_text_subclass tool by identifying the unique RGB colors
if color in rgb_colors_for_mask_with_text_subclass_tool:
cp_mask.append(
lb_types.ObjectAnnotation(
name = "mask_with_text_subclass", # must match your ontology feature"s name
value=lb_types.Mask(
mask=mask_data,
color=color),
classifications=[
lb_types.ClassificationAnnotation(
name="sub_free_text",
value=lb_types.Text(answer="free text answer sample")
)]
)
)
else:
# Create ObjectAnnotation for other masks
cp_mask.append(
lb_types.ObjectAnnotation(
name="mask",
value=lb_types.Mask(
mask=mask_data,
color=color
)
)
)
# NDJSON using bytes array
cp_mask_url = "https://storage.googleapis.com/labelbox-datasets/image_sample_data/composite_mask.png"
colors = extract_rgb_colors_from_url(cp_mask_url)
cp_mask_ndjson = []
#Using bytes array.
response = requests.get(cp_mask_url)
im_bytes = base64.b64encode(response.content).decode('utf-8')
for color in colors:
if color in rgb_colors_for_mask_with_text_subclass_tool:
cp_mask_ndjson.append({
"name": "mask_with_text_subclass",
"mask": {"imBytes": im_bytes,
"colorRGB": color },
"classifications":[{
"name": "sub_free_text",
"answer": "free text answer"
}]
}
)
else:
cp_mask_ndjson.append({
"name": "mask",
"classifications": [],
"mask": {
"imBytes": im_bytes,
"colorRGB": color
}
}
)
Point
point_annotation = lb_types.ObjectAnnotation(
name="point", # must match your ontology feature's name
value=lb_types.Point(x=1166.606, y=1441.768),
)
point_annotation_ndjson = {
"name": "point",
"classifications": [],
"point": {
"x": 1166.606,
"y": 1441.768
}
}
Polyline
polyline_annotation = lb_types.ObjectAnnotation(
name="polyline", # must match your ontology feature's name
value=lb_types.Line( # Coordinates for the keypoints in your polyline
points=[
lb_types.Point(x=2534.353, y=249.471),
lb_types.Point(x=2429.492, y=182.092),
lb_types.Point(x=2294.322, y=221.962),
lb_types.Point(x=2224.491, y=180.463),
lb_types.Point(x=2136.123, y=204.716),
lb_types.Point(x=1712.247, y=173.949),
lb_types.Point(x=1703.838, y=84.438),
lb_types.Point(x=1579.772, y=82.61),
lb_types.Point(x=1583.442, y=167.552),
lb_types.Point(x=1478.869, y=164.903),
lb_types.Point(x=1418.941, y=318.149),
lb_types.Point(x=1243.128, y=400.815),
lb_types.Point(x=1022.067, y=319.007),
lb_types.Point(x=892.367, y=379.216),
lb_types.Point(x=670.273, y=364.408),
lb_types.Point(x=613.114, y=288.16),
lb_types.Point(x=377.559, y=238.251),
lb_types.Point(x=368.087, y=185.064),
lb_types.Point(x=246.557, y=167.286),
lb_types.Point(x=236.648, y=285.61),
lb_types.Point(x=90.929, y=326.412)
]),
)
polyline_annotation_ndjson = {
"name": "polyline",
"classifications": [],
"line": [
{"x": 2534.353, "y": 249.471},
{"x": 2429.492, "y": 182.092},
{"x": 2294.322, "y": 221.962},
{"x": 2224.491, "y": 180.463},
{"x": 2136.123, "y": 204.716},
{"x": 1712.247, "y": 173.949},
{"x": 1703.838, "y": 84.438},
{"x": 1579.772, "y": 82.61},
{"x": 1583.442, "y": 167.552},
{"x": 1478.869, "y": 164.903},
{"x": 1418.941, "y": 318.149},
{"x": 1243.128, "y": 400.815},
{"x": 1022.067, "y": 319.007},
{"x": 892.367, "y": 379.216},
{"x": 670.273, "y": 364.408},
{"x": 613.114, "y": 288.16},
{"x": 377.559, "y": 238.251},
{"x": 368.087, "y": 185.064},
{"x": 246.557, "y": 167.286},
{"x": 236.648, "y": 285.61},
{"x": 90.929, "y": 326.412}
]
}
Polygon
polygon_annotation = lb_types.ObjectAnnotation(
name="polygon", # must match your ontology feature"s name
value=lb_types.Polygon( # Coordinates for the vertices of your polygon
points=[
lb_types.Point(x=1489.581, y=183.934),
lb_types.Point(x=2278.306, y=256.885),
lb_types.Point(x=2428.197, y=200.437),
lb_types.Point(x=2560.0, y=335.419),
lb_types.Point(x=2557.386, y=503.165),
lb_types.Point(x=2320.596, y=503.103),
lb_types.Point(x=2156.083, y=628.943),
lb_types.Point(x=2161.111, y=785.519),
lb_types.Point(x=2002.115, y=894.647),
lb_types.Point(x=1838.456, y=877.874),
lb_types.Point(x=1436.53, y=874.636),
lb_types.Point(x=1411.403, y=758.579),
lb_types.Point(x=1353.853, y=751.74),
lb_types.Point(x=1345.264, y=453.461),
lb_types.Point(x=1426.011, y=421.129),
]),
)
polygon_annotation_ndjson = {
"name":
"polygon",
"polygon": [
{
"x": 1489.581,
"y": 183.934
},
{
"x": 2278.306,
"y": 256.885
},
{
"x": 2428.197,
"y": 200.437
},
{
"x": 2560.0,
"y": 335.419
},
{
"x": 2557.386,
"y": 503.165
},
{
"x": 2320.596,
"y": 503.103
},
{
"x": 2156.083,
"y": 628.943
},
{
"x": 2161.111,
"y": 785.519
},
{
"x": 2002.115,
"y": 894.647
},
{
"x": 1838.456,
"y": 877.874
},
{
"x": 1436.53,
"y": 874.636
},
{
"x": 1411.403,
"y": 758.579
},
{
"x": 1353.853,
"y": 751.74
},
{
"x": 1345.264,
"y": 453.461
},
{
"x": 1426.011,
"y": 421.129
},
{
"x": 1489.581,
"y": 183.934
},
],
}
Tool with nested classification
tool_with_radio_subclass_annotation = lb_types.ObjectAnnotation(
name=# Feature name,
value=# Add tool annotation (lb_types."tool"),
classifications=[
lb_types.ClassificationAnnotation(
name="sub_radio_question",
value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
name="first_sub_radio_answer")))
])
bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(
name="bbox_with_radio_subclass",
value=lb_types.Rectangle(
start=lb_types.Point(x=541, y=933), # x = left, y = top
end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height
),
classifications=[
lb_types.ClassificationAnnotation(
name="sub_radio_question",
value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
name="first_sub_radio_answer")))
])
bbox_with_radio_subclass_ndjson = {
"name": "bbox_with_radio_subclass",
"classifications": [{
"name": "sub_radio_question",
"answer": {
"name": "first_sub_radio_answer"
}
}],
"bbox": {
"top": 933,
"left": 541,
"height": 191,
"width": 330
}
}
Relationships
relationship = lb_types.RelationshipAnnotation(
name=# Relationship name,
value=lb_types.Relationship(
source=# Source tool,
target=# Target tool,
type=lb_types.Relationship.Type.UNIDIRECTIONAL,
))
bbox_source = lb_types.ObjectAnnotation(
name="bounding_box",
value=lb_types.Rectangle(
start=lb_types.Point(x=2096, y=1264),
end=lb_types.Point(x=2240, y=1689),
),
)
bbox_target = lb_types.ObjectAnnotation(
name="bounding_box",
value=lb_types.Rectangle(
start=lb_types.Point(x=2272, y=1346),
end=lb_types.Point(x=2416, y=1704),
),
)
relationship = lb_types.RelationshipAnnotation(
name="relationship",
value=lb_types.Relationship(
source=bbox_source, # Python annotations do not required a UUID reference
target=bbox_target,
type=lb_types.Relationship.Type.UNIDIRECTIONAL,
))
uuid_source = str(uuid.uuid4())
uuid_target = str(uuid.uuid4())
bbox_source_ndjson = {
"uuid": uuid_source,
"name": "bounding_box",
"bbox": {
"top": 1264.0,
"left": 2096.0,
"height": 425.0,
"width": 144.0
}
}
bbox_target_ndjson = {
"uuid": uuid_target,
"name": "bounding_box",
"bbox": {
"top": 1346.0,
"left": 2272.0,
"height": 358.0,
"width": 144.0
}
}
relationship_ndjson = {
"name": "relationship",
"relationship": {
"source": uuid_source, # UUID reference to the source annotation
"target": uuid_target, # UUID reference to the target annotation
"type": "unidirectional"
}
}
Example: Import pre-labels or ground truths
The steps to import annotations as pre-labels (machine-assisted learning) are similar to those to import annotations as ground truth labels. However, they vary slightly, and we will describe the differences for each scenario.
Before you start
The below imports are needed to use the code examples in this section.
import uuid
from PIL import Image
import requests
import base64
import labelbox as lb
import labelbox.types as lb_types
from io import BytesIO
Replace the value of API_KEY
with a valid API key to connect to the Labelbox client.
API_KEY = None
client = lb.Client(API_KEY)
Step 1: Import data rows
Data rows must first be uploaded to Catalog to attach annotations.
This example shows how to create a data row in Catalog by attaching it to a dataset .
# send a sample image as batch to the project
global_key = "2560px-Kitano_Street_Kobe01s5s4110.jpeg"
test_img_url = {
"row_data":
"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg",
"global_key":
global_key
}
dataset = client.create_dataset(name="image-demo-dataset")
task = dataset.create_data_rows([test_img_url])
task.wait_till_done()
print(f"Failed data rows: {task.failed_data_rows}")
print(f"Errors: {task.errors}")
if task.errors:
for error in task.errors:
if 'Duplicate global key' in error['message'] and dataset.row_count == 0:
# If the global key already exists in the workspace the dataset will be created empty, so we can delete it.
print(f"Deleting empty dataset: {dataset}")
dataset.delete()
Step 2: Set up ontology
Your project ontology should support the tools and classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the name
parameter should match the value of the name
field in your annotation.
For example, when we created an annotation above, we provided a nameannotation_name
. Now, when we set up our ontology, we must ensure that the name of our bounding box tool is also anotations_name
. The same alignment must hold true for the other tools and classifications we create in our ontology.
This example shows how to create an ontology containing all supported annotation types .
ontology_builder = lb.OntologyBuilder(
classifications=[ # list of classification objects
lb.Classification(class_type=lb.Classification.Type.RADIO,
name="radio_question",
options=[
lb.Option(value="first_radio_answer"),
lb.Option(value="second_radio_answer")
]),
lb.Classification(class_type=lb.Classification.Type.CHECKLIST,
name="checklist_question",
options=[
lb.Option(value="first_checklist_answer"),
lb.Option(value="second_checklist_answer")
]),
lb.Classification(class_type=lb.Classification.Type.TEXT,
name="free_text"),
lb.Classification(
class_type=lb.Classification.Type.RADIO,
name="nested_radio_question",
options=[
lb.Option("first_radio_answer",
options=[
lb.Classification(
class_type=lb.Classification.Type.RADIO,
name="sub_radio_question",
options=[lb.Option("first_sub_radio_answer")])
])
]),
lb.Classification(
class_type=lb.Classification.Type.CHECKLIST,
name="nested_checklist_question",
options=[
lb.Option(
"first_checklist_answer",
options=[
lb.Classification(
class_type=lb.Classification.Type.CHECKLIST,
name="sub_checklist_question",
options=[lb.Option("first_sub_checklist_answer")])
])
]),
],
tools=[ # List of Tool objects
lb.Tool(tool=lb.Tool.Type.BBOX, name="bounding_box"),
lb.Tool(tool=lb.Tool.Type.BBOX,
name="bbox_with_radio_subclass",
classifications=[
lb.Classification(
class_type=lb.Classification.Type.RADIO,
name="sub_radio_question",
options=[lb.Option(value="first_sub_radio_answer")]),
]),
lb.Tool(tool=lb.Tool.Type.POLYGON, name="polygon"),
lb.Tool(tool=lb.Tool.Type.RASTER_SEGMENTATION, name="mask"),
lb.Tool(tool=lb.Tool.Type.RASTER_SEGMENTATION,
name="mask_with_text_subclass",
classifications=[
lb.Classification(
class_type=lb.Classification.Type.TEXT,
name="sub_free_text")
]),
lb.Tool(tool=lb.Tool.Type.POINT, name="point"),
lb.Tool(tool=lb.Tool.Type.LINE, name="polyline"),
lb.Tool(tool=lb.Tool.Type.RELATIONSHIP, name="relationship")
])
ontology = client.create_ontology("Image Annotation Import Demo Ontology",
ontology_builder.asdict(),
media_type=lb.MediaType.Image)
Step 3: Set Up a Labeling Project
# create a project and configure the ontology
project = client.create_project(name="Image Annotation Import Demo",
media_type=lb.MediaType.Image)
project.connect_ontology(ontology)
Step 4: Send Data Rows to Project
batch = project.create_batch(
"image-demo-batch", # each batch in a project must have a unique name
global_keys=[global_key], # paginated collection of data row objects, list of data row ids or global keys
priority=1 # priority between 1(highest) - 5(lowest)
)
print(f"Batch: {batch}")
Step 5: Create annotation payloads
For help understanding annotation payloads, see overview. To declare payloads, you can use Python annotation types (preferred) or NDJSON objects. For annotations that you want to import as ground truth labels, you can also specify benchmarks using the is_benchmark_reference
flag.
These examples demonstrate each format and how to compose annotations into labels attached to data rows.
# create a Label
label = []
annotations = [
radio_annotation,
nested_radio_annotation,
checklist_annotation,
nested_checklist_annotation,
text_annotation,
bbox_annotation,
bbox_with_radio_subclass_annotation,
polygon_annotation,
mask_annotation,
mask_with_text_subclass_annotation,
point_annotation,
polyline_annotation,
bbox_source,
bbox_target,
relationship,
] + cp_mask
label.append(
lb_types.Label(data={"global_key" : global_key },
annotations=annotations
# Optional: set the label as a benchmark
# Only supported for groud truth imports
is_benchmark_reference = True
)
)
label_ndjson = []
annotations = [
radio_annotation_ndjson,
nested_radio_annotation_ndjson,
nested_checklist_annotation_ndjson,
checklist_annotation_ndjson,
text_annotation_ndjson,
bbox_annotation_ndjson,
bbox_with_radio_subclass_ndjson,
polygon_annotation_ndjson,
mask_annotation_ndjson,
mask_with_text_subclass_ndjson,
point_annotation_ndjson,
polyline_annotation_ndjson,
bbox_source_ndjson,
bbox_target_ndjson,
relationship_ndjson, ## Only supported for MAL imports
] + cp_mask_ndjson
for annotation in annotations:
annotation.update({
"dataRow": {
"globalKey": global_key
},
})
label_ndjson.append(annotation)
Step 6: Import annotation payload
For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the predictions
parameter. For ground truths, pass the payload to the labels
parameter.
Warning
Relationship annotations are not supported for ground truth import jobs.
Option A: Upload as prelabels (model assisted labeling)
This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets.
# Upload MAL label for this data row in project
upload_job = lb.MALPredictionImport.create_from_objects(
client = client,
project_id = project.uid,
name="mal_job"+str(uuid.uuid4()),
predictions=label
)
print(f"Errors: {upload_job.errors}", )
print(f"Status of uploads: {upload_job.statuses}"
Option B: Upload to a labeling project as ground truth
This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort.
# Upload label for this data row in project
upload_job = lb.LabelImport.create_from_objects(
client = client,
project_id = project.uid,
name="label_import_job"+str(uuid.uuid4()),
labels=label
)
print(f"Errors: {upload_job.errors}", )
print(f"Status of uploads: {upload_job.statuses}")