To import annotations in Labelbox, you need to create an annotations payload. In this section, we provide this payload for every supported annotation type.
This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox Annotate. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth.
text_annotation = lb_types.ClassificationAnnotation( name="free_text", # must match your ontology feature"s name value=lb_types.Text(answer="sample text"))
When importing a bounding box annotation, you need to specify its DocumentRectangle value, which defines its precise area on a document page. You can choose from the following RectangleUnit options, which determine the measurement unit used to define the dimensions and coordinates of the bounding box:
INCHES
PIXELS
POINTS
PERCENT
Copy
Ask AI
bbox_annotation = lb_types.ObjectAnnotation( name="bounding_box", # must match your ontology feature's name value=lb_types.DocumentRectangle( start=lb_types.Point(x=102.771, y=135.3), # x = left, y = top end=lb_types.Point(x=518.571, y=245.143), # x = left + width , y = top + height page=0, unit=lb_types.RectangleUnit.POINTS ) )
textSelections is the payload required for each entity annotation. EachtextSelections item in the list requires the following fields:
ThegroupId associated with a group of words.
A list of tokenIds for each word in the group of words.
The page of the document (1-indexed).
Both tokenIds and groupdId can extracted from the text layer URL attached to the data row. For more information on text layers, visit our import document data guide.
For importing entity annotations, you can use your own text_layer_url or a Labelbox-generated text_layer_url.
You can get the Labelbox-generated text_layer_url by exporting the data row. The below code snippet demonstrates this process by exporting from a data row.
Copy
Ask AI
# Export data rowtask = lb.DataRow.export(client=client,global_keys=[global_key])task.wait_till_done()if task.has_result():stream = task.get_buffered_stream() text_layer = "" for output in stream: output_json = output.json text_layer = output_json['media_attributes']['text_layer_url'] print(text_layer)
Copy
Ask AI
import requestsimport json# Helper method for updating text selectionsdef update_text_selections(annotation, group_id, list_tokens, page):return annotation.update({ "textSelections": [ { "groupId": group_id, "tokenIds": list_tokens, "page": page } ]})# Fetch the content of the text layerres = requests.get(text_layer)# Phrases that we want to annotation obtained from the text layer urlcontent_phrases = ["Metal-insulator (MI) transitions have been one of the" , "T. Sasaki, N. Yoneyama, and N. Kobayashi",, "Organic charge transfer salts based on the donor", "the experimental investigations on this issue have not"]# Parse the text layertext_selections = []text_selections_ner = []text_selections_source = []text_selections_target = []for obj in json.loads(res.text):for group in obj["groups"]: if group["content"] == content_phrases[0]: list_tokens = [x["id"] for x in group["tokens"]] # build text selections for Python Annotation Types document_text_selection = lb_types.DocumentTextSelection(groupId=group["id"], tokenIds=list_tokens, page=1) text_selections.append(document_text_selection) # build text selection for the NDJson annotations update_text_selections(annotation=entities_annotations_ndjson, group_id=group["id"], # id representing group of words list_tokens=list_tokens, # ids representing individual words from the group page=1) if group["content"] == content_phrases[1]: list_tokens_2 = [x["id"] for x in group["tokens"]] # build text selections for Python Annotation Types ner_text_selection = lb_types.DocumentTextSelection(groupId=group["id"], tokenIds=list_tokens_2, page=1) text_selections_ner.append(ner_text_selection) # build text selection for the NDJson annotations update_text_selections(annotation=ner_with_checklist_subclass_annotation_ndjson, group_id=group["id"], # id representing group of words list_tokens=list_tokens_2, # ids representing individual words from the group page=1) if group["content"] == content_phrases[2]: relationship_source = [x["id"] for x in group["tokens"]] # build text selections for Python Annotation Types text_selection_entity_source = lb_types.DocumentTextSelection(groupId=group["id"], tokenIds=relationship_source, page=1) text_selections_source.append(text_selection_entity_source) # build text selection for the NDJson annotations update_text_selections(annotation=entity_source_ndjson, group_id=group["id"], # id representing group of words list_tokens=relationship_source, # ids representing individual words from the group page=1) if group["content"] == content_phrases[3]: relationship_target = [x["id"] for x in group["tokens"]] # build text selections for Python Annotation Types text_selection_entity_target = lb_types.DocumentTextSelection(group_id=group["id"], tokenIds=relationship_target, page=1) text_selections_target.append(text_selection_entity_target) # build text selections forthe NDJson annotations update_text_selections(annotation=entity_target_ndjson, group_id=group["id"], # id representing group of words list_tokens=relationship_target, # ids representing individual words from the group page=1)
Re-write the Python annotations to include text selections (only required for Python annotation types)
Copy
Ask AI
# re-write the entity annotation with text selectionsentities_annotation_document_entity = lb_types.DocumentEntity(name="named_entity", textSelections = text_selections)entities_annotation = lb_types.ObjectAnnotation(name="named_entity",value=entities_annotation_document_entity)# re-write the entity annotation + subclassification with text selectionsclassifications = [lb_types.ClassificationAnnotation(name="sub_checklist_question",value=lb_types.Checklist(answer=[lb_types.ClassificationAnswer(name="first_sub_checklist_answer")]))]ner_annotation_with_subclass = lb_types.DocumentEntity(name="ner_with_checklist_subclass", textSelections= text_selections_ner)ner_with_checklist_subclass_annotation = lb_types.ObjectAnnotation(name="ner_with_checklist_subclass",value=ner_annotation_with_subclass,classifications=classifications)# re-write the entity source and target annotations withe text selectiosentity_source_doc = lb_types.DocumentEntity(name="named_entity", text_selections= text_selections_source)entity_source = lb_types.ObjectAnnotation(name="named_entity", value=entity_source_doc)entity_target_doc = lb_types.DocumentEntity(name="named_entity", text_selections=text_selections_target)entity_target = lb_types.ObjectAnnotation(name="named_entity", value=entity_target_doc)# re-write the entity relationship with the re-created entitiesentity_relationship = lb_types.RelationshipAnnotation(name="relationship",value=lb_types.Relationship(source=entity_source,target=entity_target,type=lb_types.Relationship.Type.UNIDIRECTIONAL,))print(f"entities_annotations_ndjson={entities_annotations_ndjson}")print(f"entities_annotation={entities_annotation}")print(f"nested_entities_annotation_ndjson={ner_with_checklist_subclass_annotation_ndjson}")print(f"nested_entities_annotation={ner_with_checklist_subclass_annotation}")print(f"entity_source_ndjson={entity_source_ndjson}")print(f"entity_target_ndjson={entity_target_ndjson}")print(f"entity_source={entity_source}")print(f"entity_target={entity_target}")
The steps to import annotations as pre-labels (machine-assisted learning) are similar to those to import annotations as ground truth labels. However, they vary slightly, and we will describe the differences for each scenario.
Your project ontology should support the tools and classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the name parameter should match the value of the name field in your annotation.
For example, when we created an annotation above, we provided a nameannotation_name. Now, when we set up our ontology, we must ensure that the name of our bounding box tool is also anotations_name. The same alignment must hold true for the other tools and classifications we create in our ontology.
This example shows how to create an ontology containing all supported annotation types .
project.create_batch( "PDF_annotation_batch", # Each batch in a project must have a unique name global_keys=[global_key] , # a list of global keys, data rows, or data row ids priority=5 # priority between 1(highest) - 5(lowest))
For help understanding annotation payloads, see overview. To declare payloads, you can use Python annotation types (preferred) or NDJSON objects.
These examples demonstrate each format and how to compose annotations into labels attached to data rows.
Copy
Ask AI
# create a Labellabels = []labels.append(lb_types.Label(data={"global_key" : global_key },annotations = [entities_annotation,checklist_annotation,nested_checklist_annotation,text_annotation,radio_annotation,nested_radio_annotation,bbox_annotation,bbox_with_radio_subclass_annotation,ner_with_checklist_subclass_annotation,entity_source,entity_target,entity_relationship,# Only supported for MAL importsbbox_source,bbox_target,bbox_relationship # Only supported for MAL imports]))
For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the predictions parameter. For ground truths, pass the payload to the labels parameter.
This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets.
Copy
Ask AI
# Upload MAL label for this data row in projectupload_job = lb.MALPredictionImport.create_from_objects( client = client, project_id = project.uid, name="mal_job"+str(uuid.uuid4()), predictions=label)print(f"Errors: {upload_job.errors}", )print(f"Status of uploads: {upload_job.statuses}"
Option B: Upload to a labeling project as ground truth
This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort.
Copy
Ask AI
# Upload label for this data row in projectupload_job = lb.LabelImport.create_from_objects( client = client, project_id = project.uid, name="label_import_job"+str(uuid.uuid4()), labels=label)print(f"Errors: {upload_job.errors}", )print(f"Status of uploads: {upload_job.statuses}")