To import annotations in Labelbox, you need to create an annotations payload. In this section, we provide this payload for every supported annotation type.
Model-assisted labeling (MAL) allows you to import computer-generated predictions and simple annotations created outside of Labelbox as pre-labels on an asset.
Ground truth allows you to bulk import ground truth annotations from an external or third-party labeling system into Labelbox Annotate. Using the label import API to import external data can consolidate and migrate all annotations into Labelbox as a single source of truth.
prompt_annotation = lb_types.PromptClassificationAnnotation( name = "Follow the prompt and select answers", value = lb_types.PromptText(answer = "This is an example of a prompt"))
response_text_annotation = lb_types.ClassificationAnnotation( name = "Provide a reason for your choice", value = lb_types.Text(answer = "This is an example of a response text"))
The steps to import annotations as pre-labels (machine-assisted learning) are similar to the steps to import annotations as ground truth labels, and we will describe the slight differences for each scenario.
A prompts and responses creation project automatically generates empty data rows upon creation. You will then need to obtain either the global_keys or data_row_ids attached to the generated data rows by exporting them from the created project or obtaining them directly on the data row tab using the UI.
For response creation projects, text data rows are used and are not generated upon project creation. The following steps create a dataset with a text data row attached, create a response creation project, and batch the created data row towards the project.
Copy
Ask AI
# Create dataset with text data rowglobal_key = "lorem-ipsum.txt"text_asset = { "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt", "global_key": global_key, "media_type": "TEXT", }dataset = client.create_dataset(name="text_annotation_import_demo_dataset")task = dataset.create_data_rows([text_asset])task.wait_till_done()print("Errors:",task.errors)print("Failed data rows:", task.failed_data_rows)# Create response creation projectproject = client.create_response_creation_project(name="Demo response project",)# Create a batch of data rows for newly created projectbatch = project.create_batch("Demo response batch", # each batch in a project must have a unique nameglobal_keys=[global_key], # paginated collection of data row objects, list of data row ids or global keyspriority=1 # priority between 1(highest) - 5(lowest))
Your project ontology needs to support the classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the name parameter needs to match the value of the name field in your annotation.
For example, if you provide a name annotation_name for your created annotation, you need to name the bounding box tool as anotations_name when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.
This example shows how to create an ontology containing all supported by prompt and response project types annotation types.
Copy
Ask AI
ontology_builder = lb.OntologyBuilder( tools=[], classifications=[ lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.Type.PROMPT, name="prompt text", character_min = 1, # Minimum character count of prompt field (optional) character_max = 20, # Maximum character count of prompt field (optional) ), lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST, name="response checklist feature", options=[ lb.ResponseOption(value="option_1", label="option_1"), lb.ResponseOption(value="option_2", label="option_2"), ], ), lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO, name="response radio feature", options=[ lb.ResponseOption(value="first_radio_answer"), lb.ResponseOption(value="second_radio_answer"), ], ), lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT, name="response text", character_min = 1, # Minimum character count of response text field (optional) character_max = 20, # Maximum character count of response text field (optional) ), lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO, name="nested_response_radio_question", options=[ lb.ResponseOption("first_radio_answer", options=[ lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.RESPONSE_RADIO, name="sub_radio_question", options=[lb.ResponseOption("first_sub_radio_answer")] ) ]) ], ), lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST, name="nested_response_checklist_question", options=[ lb.ResponseOption("first_checklist_answer", options=[ lb.PromptResponseClassification( class_type=lb.PromptResponseClassification.RESPONSE_CHECKLIST, name="sub_checklist_question", options=[lb.ResponseOption("first_sub_checklist_answer")] ) ]) ], ), ],)# Create ontologyontology = client.create_ontology( "Prompt and response ontology", ontology_builder.asdict(), media_type=lb.MediaType.LLMPromptResponseCreation,)# Connect ontologyprompt_response_project.connect_ontology(ontology)
For prompt response creation and prompt creation projects you will need to obtain either the global_keys or data_row_ids attached to the generated data rows by exporting them from the created project. Since the generation of data rows is an async process you will need to wait for the project data rows to be completed before exporting.
Copy
Ask AI
time.sleep(20)export_task = prompt_response_project.export()export_task.wait_till_done()# Stream the export using a callback functiondef json_stream_handler(output: labelbox.BufferedJsonConverterOutput):print(output.json)export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)# Collect all exported data into a listexport_json = [data_row.json for data_row in export_task.get_buffered_stream()]# Obtain global keys to be used later onglobal_keys = [dr.json["data_row"]["global_key"] for dr in stream]
For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the predictions parameter. For ground truths, pass the payload to the labels parameter.
Depending on the type of prompt and response project you are using your payload might look different. For the response payload, you can also use is_benchmark_reference to specify benchmarks.
This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets.
Copy
Ask AI
# Upload MAL label for this data row in projectupload_job = lb.MALPredictionImport.create_from_objects( client = client, project_id = prompt_response_project.uid, # Replace with project of different prompt and response project types name = "mal_job"+str(uuid.uuid4()), predictions = label)print(f"Errors: {upload_job.errors}", )print(f"Status of uploads: {upload_job.statuses}")
Option B: Upload to a labeling project as ground truth
This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort.
Copy
Ask AI
# Upload label for this data row in projectupload_job = lb.LabelImport.create_from_objects( client = client, project_id = prompt_response_project.uid, # Replace with project of different prompt and response project types name = "label_import_job" + str(uuid.uuid4()), labels = label)print(f"Errors: {upload_job.errors}", )print(f"Status of uploads: {upload_job.statuses}")
Assistant
Responses are generated using AI and may contain mistakes.