Import LLM response evaluations annotations

Developer guide for importing LLM response evaluations annotations and sample import formats.

Overview

To import annotations in Labelbox, you need to create an annotations payload. In this section, we provide this payload for every supported annotation type.

Annotation payload types

Labelbox supports two formats for the annotations payload:

  • Python annotation types (recommended)
    • Provides a seamless transition between third-party platforms, machine learning pipelines, and Labelbox.
    • Allows you to build annotations locally with local file paths, numpy arrays, or URLs
    • Easily convert Python Annotation Type format to NDJSON format to quickly import annotations to Labelbox
    • Supports one-level nested classification (radio, checklist, or free-form text) under a tool or classification annotation.
  • JSON
    • Skips formatting annotation payload in the Labelbox Python annotation type
    • Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation.

Label import types

Labelbox additionally supports two types of label imports:

  • Model-assisted labeling (MAL)
    • This workflow allows you to import computer-generated predictions (or simply annotations created outside of Labelbox) as pre-labels on an asset.
  • Ground truth
    • This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox Annotate. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth.

Supported annotations

The following annotations are supported for a LLM human preference data row:

  • Radio
  • Checklist
  • Free-form text
  • Entity

📘

Annotations can be message or global based

Classification annotations can be supported both message and global based while tool annotations are supported only message specific. Removing the message_id key inside a conversation text classification annotation results in the annotation becoming global.

Classifications

Radio (single-choice)

radio_annotation = lb_types.ClassificationAnnotation(
    name="Choose the best response",
    value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
        name="Response B")))
radio_annotation_ndjson = {
    "name": "radio_convo",
    "answer": {
        "name": "first_radio_answer"
    }
}

Checklist (multi-choice)

checklist_annotation= lb_types.ClassificationAnnotation(
  name="checklist_convo", # must match your ontology feature"s name
  value=lb_types.Checklist(
      answer = [
        lb_types.ClassificationAnswer(
            name = "first_checklist_answer"
        ),
        lb_types.ClassificationAnswer(
            name = "second_checklist_answer"
        )
      ]
    ),
  message_id="message-1" # Message specific annotation
 )
checklist_annotation_ndjson = {
    "name": "checklist_convo",
    "answers": [
        {"name": "first_checklist_answer"},
        {"name": "second_checklist_answer"}
    ]
}

Free-form text

text_annotation = lb_types.ClassificationAnnotation(
    name="Provide a reason for your choice",
    value=lb_types.Text(answer="the answer to the text questions right here")
)
text_annotation_ndjson = {
    "name": "text_convo",
    "answer": "the answer to the text questions right here",
}

Nested classification

nested_radio_annotation = lb_types.ClassificationAnnotation(
  name="nested_radio_question",
  value=lb_types.Radio(
    answer=lb_types.ClassificationAnswer(
      name="first_radio_answer",
      classifications=[
        lb_types.ClassificationAnnotation(
          name="sub_radio_question",
          value=lb_types.Radio(
            answer=lb_types.ClassificationAnswer(
              name="first_sub_radio_answer"
            )
          )
        )
      ]
    )
  )
)

nested_checklist_annotation = lb_types.ClassificationAnnotation(
  name="nested_checklist_question",
  message_id="message-1",
  value=lb_types.Checklist(
    answer=[lb_types.ClassificationAnswer(
      name="first_checklist_answer",
      classifications=[
        lb_types.ClassificationAnnotation(
          name="sub_checklist_question",
          value=lb_types.Checklist(
            answer=[lb_types.ClassificationAnswer(
            name="first_sub_checklist_answer"
          )]
        ))
      ]
    )]
  )
)
nested_radio_annotation_ndjson = {
  "name": "nested_radio_question",
  "answer": {
      "name": "first_radio_answer",
      "classifications": [{
          "name":"sub_radio_question",
          "answer": { "name" : "first_sub_radio_answer"}
        }]
    }
}

nested_checklist_annotation_ndjson = {
  "name": "nested_checklist_question",
  "messageId": "message-1",
  "answer": [{
      "name": "first_checklist_answer",
      "classifications" : [
        {
          "name": "sub_checklist_question",
          "answer": {
            "name": "first_sub_checklist_answer",
          }
        }
      ]
  }]
}

Tools

Entity

ner_annotation = lb_types.ObjectAnnotation(
    name="ner",
    value=lb_types.ConversationEntity(
        start=0,
        end=8,
        message_id="message-1"
    )
)
ner_annotation_ndjson = {
        "name": "ner",
        "location": {
            "start": 0,
            "end": 8
        },
        "messageId": "message-1"
    }

Tool with nested classification

tool_with_radio_subclass_annotation = lb_types.ObjectAnnotation(
    name=# Feature name,
    value=# Add tool annotation (lb_types."tool"),
    classifications=[
        lb_types.ClassificationAnnotation(
            name="sub_radio_question",
            value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
                name="first_sub_radio_answer")))
    ])
entity_with_radio_subclass_annotation = lb_types.ObjectAnnotation(
    name="ner",
    value=lb_types.ConversationEntity(
        start=0,
        end=8,
        message_id="message-1"
    	)
		),
    classifications=[
        lb_types.ClassificationAnnotation(
            name="sub_radio_question",
            value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
                name="first_sub_radio_answer")))
    ])
ner_with_radio_subclass_ndjson = {
    "name": "bbox_with_radio_subclass",
    "classifications": [{
        "name": "sub_radio_question",
        "answer": {
            "name": "first_sub_radio_answer"
        }
    }],
    "ner": {
        "location": {
            "start": 0,
            "end": 8
        },
        "messageId": "message-1"
    }
}

Example: Import pre-labels or ground truths

The steps to import annotations as pre-labels (machine-assisted learning) are similar to those to import annotations as ground truth labels. However, they vary slightly, and we will describe the differences for each scenario.

Before you start

The below imports are needed to use the code examples in this section.

import labelbox as lb
import uuid
import labelbox.types as lb_types

Step 1: Import data rows

Data rows must first be uploaded to Catalog to attach annotations.

This example shows how to create a data row in Catalog by attaching it to a dataset .

global_key = "pairwise_shooping_asset"

# Upload data rows
convo_data = {
    "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_2.json",
    "global_key": global_key
}

# Create a dataset
dataset = client.create_dataset(name="pairwise_annotation_demo")
# Create a datarow
task = dataset.create_data_rows([convo_data])
task.wait_till_done()
print("Errors:",task.errors)
print("Failed data rows:", task.failed_data_rows)
pairwise_shopping_2 =  """
 {
  "type": "application/vnd.labelbox.conversational",
  "version": 1,
  "messages": [
    {
      "messageId": "message-0",
      "timestampUsec": 1530718491,
      "content": "Hi! How can I help?",
      "user": {
        "userId": "Bot 002",
        "name": "Bot"
      },
      "align": "left",
      "canLabel": false
    },
    {
      "messageId": "message-1",
      "timestampUsec": 1530718503,
      "content": "I just bought a vacuum cleaner from you guys like a week ago and it's already broken!!",
      "user": {
        "userId": "User 00686",
        "name": "User"
      },
      "align": "right",
      "canLabel": true
    }

  ],
  "modelOutputs": [
    {
      "title": "Response A",
      "content": "I'm really sorry to hear that your vacuum cleaner is not working as expected. We certainly understand how frustrating this can be, especially with a recent purchase. I assure you, we're here to help!\n\n To proceed with resolving this issue, could you please provide some details about the problem you're experiencing with the vacuum cleaner? Additionally, if you could share your order number or receipt details, that would be very helpful. We will definitely look into the available options for you, which may include a replacement or a refund, depending on the specific circumstances and our return policy.\n\n Your satisfaction is our priority, and we'll do our best to resolve this matter promptly and to your satisfaction.",
      "modelConfigName": "GPT-3.5 with temperature 0"
    },
    {
      "title": "Response B",
      "content": "I'm sorry about the issue with your vacuum cleaner. Please send us the order number or receipt details so we can quickly assist you with a replacement. Your satisfaction is important to us!",
      "modelConfigName": "Fine Tuned GPT-3.5 with demo data"
    }
  ]
}
"""

Step 2: Set up ontology

Your project ontology should support the tools and classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the name parameter should match the value of the name field in your annotation.

For example, when we created an annotation above, we provided a nameannotation_name. Now, when we set up our ontology, we must ensure that the name of our bounding box tool is also anotations_name. The same alignment must hold true for the other tools and classifications we create in our ontology.

This example shows how to create an ontology containing all supported annotation types .

ontology_builder = lb.OntologyBuilder(
  tools=[
    lb.Tool(tool=lb.Tool.Type.NER,name="ner"),
  ],
  classifications=[
    lb.Classification(
      class_type=lb.Classification.Type.RADIO,
      scope=lb.Classification.Scope.GLOBAL,
      name="Choose the best response",
      options=[lb.Option(value="Response A"), lb.Option(value="Response B"), lb.Option(value="Tie")]
    ),
    lb.Classification(
      class_type=lb.Classification.Type.TEXT,
      name="Provide a reason for your choice"
    ),
    lb.Classification(
      class_type=lb.Classification.Type.CHECKLIST,
      scope=lb.Classification.Scope.INDEX,
      name="checklist_convo",
      options=[
        lb.Option(value="first_checklist_answer"),
        lb.Option(value="second_checklist_answer")
      ]
    ),
    lb.Classification(
      class_type=lb.Classification.Type.CHECKLIST,
      name="nested_checklist_question",
      scope = lb.Classification.Scope.INDEX,
      options=[
          lb.Option("first_checklist_answer",
            options=[
              lb.Classification(
                  class_type=lb.Classification.Type.CHECKLIST,
                  name="sub_checklist_question",
                  options=[lb.Option("first_sub_checklist_answer")]
              )
          ])
      ]
    ),
    lb.Classification(
        class_type=lb.Classification.Type.RADIO,
        name="nested_radio_question",
        scope = lb.Classification.Scope.GLOBAL,
        options=[
            lb.Option("first_radio_answer",
                options=[
                    lb.Classification(
                        class_type=lb.Classification.Type.RADIO,
                        name="sub_radio_question",
                        options=[lb.Option("first_sub_radio_answer")]
                    )
                ])
          ]
    )
  ]
)

ontology = client.create_ontology("Pairwise comparison ontology", ontology_builder.asdict(), media_type=lb.MediaType.Conversational)


Step 3: Set Up a Labeling Project

# Create Labelbox project
project = client.create_project(name="Conversational Text Annotation Import Demo (Pairwise comparison)",
                                    media_type=lb.MediaType.Conversational)

# Setup your ontology
project.connect_ontology(ontology) # Connect the ontology to your project

Step 4: Send Data Rows to Project

# Create a batch to send to your project
batch = project.create_batch(
  "first-batch-convo-demo", # Each batch in a project must have a unique name
  global_keys=[global_key], # Paginated collection of data row objects, list of data row ids or global keys
  priority=5 # priority between 1(Highest) - 5(lowest)
)

print("Batch: ", batch)

Step 5: Create annotation payloads

For help understanding annotation payloads, see overview. To declare payloads, you can use Python annotation types (preferred) or NDJSON objects.

These examples demonstrate each format and how to compose annotations into labels attached to data rows.

label = []
label.append(
  lb_types.Label(
    data={"global_key" : global_key },
    annotations=[
      ner_annotation,
      text_annotation,
      checklist_annotation,
      radio_annotation,
      nested_radio_annotation,
      nested_checklist_annotation
    ]
  )
)
label_ndjson = []
for annotations in [
    ner_annotation_ndjson,
    text_annotation_ndjson,
    checklist_annotation_ndjson,
    radio_annotation_ndjson,
    nested_checklist_annotation_ndjson,
    nested_radio_annotation_ndjson
    ]:
  annotations.update({
      "dataRow": {
          "globalKey": global_key
      }
  })
  label_ndjson.append(annotations)

Step 6: Import annotation payload

For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the predictions parameter. For ground truths, pass the payload to the labels parameter.

This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets.

# Upload MAL label for this data row in project
upload_job = lb.MALPredictionImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name="mal_job"+str(uuid.uuid4()), 
    predictions=label
)

print(f"Errors: {upload_job.errors}", )
print(f"Status of uploads: {upload_job.statuses}"

Option B: Upload to a labeling project as ground truth

This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort.

# Upload label for this data row in project
upload_job = lb.LabelImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name="label_import_job"+str(uuid.uuid4()),  
    labels=label
)

print(f"Errors: {upload_job.errors}", )
print(f"Status of uploads: {upload_job.statuses}")