Import prompt and response data

Developer guide for importing annotations on LLM prompt and response data.

Overview

To import annotations in Labelbox, you need to create an annotations payload. In this section, we provide this payload for every supported annotation type.

Annotation payload types

Labelbox supports two formats for the annotations payload:

  • Python annotation types (recommended)
    • Provides a seamless transition between third-party platforms, machine learning pipelines, and Labelbox.
    • Allows you to build annotations locally with local file paths, numpy arrays, or URLs.
    • Supports easy conversion to NDJSON format to quickly import annotations to Labelbox.
    • Supports one-level nested classification (radio, checklist, or free-form text) under a tool or classification annotation.
  • JSON
    • Skips formatting annotation payload in the Labelbox Python annotation type.
    • Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation.

Label Import Types

Labelbox supports two types of label imports:

  • Model-assisted labeling (MAL)
    • This workflow allows you to import computer-generated predictions (or simply annotations created outside of Labelbox) as pre-labels on an asset.
  • Ground truth
    • This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox Annotate. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth.

Supported Annotations

The following annotations are supported for an prompt and response generated data row:

  • Radio
  • Checklist
  • Free-form text

📘

Annotations are prompt and response based

Prompt and response specific classifications are supported on prompt and response generated assets

Prompt

Free-form text

📘

Information

Only one prompt annotation is allowed per label.

prompt_annotation = lb_types.PromptClassificationAnnotation(
    name = "Follow the prompt and select answers",
    value = lb_types.PromptText(answer = "This is an example of a prompt")
)
prompt_annotation_ndjson = {
  "name": "Follow the prompt and select answers",
  "answer": "This is an example of a prompt"
}

Responses

Radio (single choice)

response_radio_annotation = lb_types.ClassificationAnnotation(
    name="response_radio",
    value=lb_types.Radio(answer = 
        lb_types.ClassificationAnswer(name = "response_a")
    )
)
response_radio_annotation_ndjson = {
  "name": "response_radio",
  "answer": {
      "name": "response_a"
    }
}

Checklist (multiple choice)

response_checklist_annotation = lb_types.ClassificationAnnotation(
    name="response_checklist",
    value=lb_types.Checklist(answer = [
        lb_types.ClassificationAnswer(name = "response_a"),
        lb_types.ClassificationAnswer(name = "response_c"),
    ])
  )
response_checklist_annotation_ndjson = {
  "name": "response_checklist",
  "answer": [
    {
      "name": "response_a"
    },
    {
      "name": "response_c"
    }
  ]
}

Free-form text

response_text_annotation = lb_types.ClassificationAnnotation(
    name = "Provide a reason for your choice", 
    value = lb_types.Text(answer = "This is an example of a response text")
)
response_text_annotation_ndjson = {
  "name": "Provide a reason for your choice",
  "answer": "This is an example of a response text"
}

Example: Import pre-labels or ground truths

The steps to import annotations as pre-labels (machine-assisted learning) are similar to those to import annotations as ground truth labels. However, they vary slightly, and we will describe the differences for each scenario.

Before you start

The below imports are needed to use the code examples in this section.

import labelbox as lb
import uuid

Replace the value of API_KEY with a valid API key to connect to the Labelbox client.

API_KEY = None
client = lb.Client(API_KEY)

Step 1: Specify Project ID and Global Keys

  1. Go to Annotate and select New project.
  2. Select LLM data generation and then select Humans generate prompts and responses.
  3. Name your project, select Create a new dataset, and name your dataset (data rows will be generated automatically in this step).

📘

No SDK support

Creating a project and creating an ontology for LLM data generation is not yet supported through the SDK. Follow the steps below to create a project and ontology via the UI.

# Enter the project id
project_id = ""

# Select one of the global keys from the data rows generated
global_key = ""

Step 2: Create or Select an Ontology in the UI

  1. In your project, navigate to Settings > Label editor.
  2. Select Edit.
  3. Select a new ontology and add the features used in this demo.
// For this demo, the following ontology was generated in the UI.
ontology_json = """
{
 "tools": [],
 "relationships": [],
 "classifications": [
  {
   "schemaNodeId": "clpvq9d0002yt07zy0khq42rp",
   "featureSchemaId": "clpvq9d0002ys07zyf2eo9p14",
   "type": "prompt",
   "name": "Follow the prompt and select answers",
   "archived": false,
   "required": true,
   "options": [],
   "instructions": "Follow the prompt and select answers",
   "minCharacters": 5,
   "maxCharacters": 100
  },
  {
   "schemaNodeId": "clpvq9d0002yz07zy0fjg28z7",
   "featureSchemaId": "clpvq9d0002yu07zy28ik5w3i",
   "type": "response-radio",
   "name": "response_radio",
   "instructions": "response_radio",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [
    {
     "schemaNodeId": "clpvq9d0002yw07zyci2q5adq",
     "featureSchemaId": "clpvq9d0002yv07zyevmz1yoj",
     "value": "response_a",
     "label": "response_a",
     "position": 0,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0002yy07zy8pe48zdj",
     "featureSchemaId": "clpvq9d0002yx07zy0jvmdxk8",
     "value": "response_b",
     "label": "response_b",
     "position": 1,
     "options": []
    }
   ]
  },
  {
   "schemaNodeId": "clpvq9d0002z107zygf8l62ys",
   "featureSchemaId": "clpvq9d0002z007zyg26115f9",
   "type": "response-text",
   "name": "provide_a_reason_for_your_choice",
   "instructions": "Provide a reason for your choice",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [],
   "minCharacters": 5,
   "maxCharacters": 100
  },
  {
   "schemaNodeId": "clpvq9d0102z907zy8b10hjcj",
   "featureSchemaId": "clpvq9d0002z207zy6xla7f82",
   "type": "response-checklist",
   "name": "response_checklist",
   "instructions": "response_checklist",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [
    {
     "schemaNodeId": "clpvq9d0102z407zy0adq0rfr",
     "featureSchemaId": "clpvq9d0002z307zy6dqb8xsw",
     "value": "response_a",
     "label": "response_a",
     "position": 0,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0102z607zych8b2z5d",
     "featureSchemaId": "clpvq9d0102z507zyfwfgacrn",
     "value": "response_c",
     "label": "response_c",
     "position": 1,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0102z807zy03y7gysp",
     "featureSchemaId": "clpvq9d0102z707zyh61y5o3u",
     "value": "response_d",
     "label": "response_d",
     "position": 2,
     "options": []
    }
   ]
  }
 ],
 "realTime": false
}

"""

Step 3: Create the Annotations Payload

label = []
label.append(
  lb_types.Label(
   data = {"global_key" : global_key },
    annotations = [
      prompt_annotation,
      response_radio_annotation,
      response_checklist_annotation,
      response_text_annotation,
    ]
  )
)
label = []
for annotations in [
    prompt_annotation_ndjson,
    response_radio_annotation_ndjson,
    response_text_annotation_ndjson,
    response_checklist_annotation_ndjson
    ]:
  annotations.update({
      "dataRow": {
          "globalKey": global_key
      }
  })
  label.append(annotations)

Step 4: Get your Project through the SDK

project = client.get_project(project_id = project_id)

Step 5: Import Annotation Payload

For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the predictions parameter. For ground truths, pass the payload to the labels parameter.

This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets.

# Upload MAL label for this data row in project
upload_job = lb.MALPredictionImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name = "mal_job"+str(uuid.uuid4()), 
    predictions = label
)

print(f"Errors: {upload_job.errors}", )
print(f"Status of uploads: {upload_job.statuses}")

Option B: Upload to a labeling project as ground truth

This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort.

# Upload label for this data row in project
upload_job = lb.LabelImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name = "label_import_job" + str(uuid.uuid4()),  
    labels = label
)

print(f"Errors: {upload_job.errors}", )
print(f"Status of uploads: {upload_job.statuses}")