Import conversation & model response data

How to import conversation text data & model response data.

Specifications

File format: JSON in our conversation format
Encoding: UTF-8 (Note: The Editor does not process any special character sequences like HTML Entities, Unicode Escape Sequence, or colon emoji aliases.)

Import methods:

  • IAM Delegated Access
  • Signed URLs (https URLs only)

When importing conversation or thread data to Labelbox, your JSON file should include the following information for each data row.

ParameterRequiredDescription
row_dataYeshttps path to a cloud-hosted conversational text JSON file. See the section below for details on our conversation format.

For IAM Delegated Access, this URL must be in virtual-hosted-style format. For older regions, your S3 bucket may be in https://<bucket-name>.s3.<region>.amazonaws.com/<key> format. If your object URLs are formatted this way, make sure they are in the virtual-hosted-style format before importing.
global_keyNoUnique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if their global keys are duplicated to existing data rows.
media_typeNo"CONVERSATIONAL" (optional media type to provide better validation and error messaging)
metadata_fieldsNoSee Metadata.

Import format

[
    {
      "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_1.json",
      "global_key": "global_key_1"
    },
    {
        "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_2.json",
        "global_key": "global_key_2"
    },
    {
        "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_3.json",
        "global_key": "global_key_3"
    }
]

Python example

# Generate dummy global keys
global_key_1 = str(uuid.uuid4())
global_key_2 = str(uuid.uuid4())
global_key_3 = str(uuid.uuid4())

# Create a dataset
dataset = client.create_dataset(
    name="pairwise_demo_"+str(uuid.uuid4()),
    iam_integration=None
)
# Upload data rows
task = dataset.create_data_rows([
    {
      "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_1.json",
      "global_key": global_key_1
    },
    {
        "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_2.json",
        "global_key": global_key_2
    },
    {
        "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_3.json",
        "global_key": global_key_3
    }
  ])
task.wait_till_done()
print("Errors:",task.errors)
print("Failed data rows:", task.failed_data_rows)

Model Response Conversation JSON format

Each data row should have its own cloud-hosted JSON file. Each JSON file should include the following.

ParameterRequiredDescription
typeYesThis should always be populated with application/vnd.labelbox.conversational
versionYesThis should be populated with 1
messagesYesThis is the array of Message Objects that makes up the conversation. Please see below for the fields that must be included for each message.

The current maximum is 250 messages in a conversation
modelOutputsYesThis is the array of Model outputs that makes up the model output to the conversation. See the section below for fields that must be included.

Message object

ParameterRequiredDescription
messageIdYesThis is the external ID for the message. This must be unique within the conversation file and will be returned on the exports to help match annotations to message.
timestampUsecNoThe unix epoch timestamp of the message.
contentYesThis field contains the text string of the conversation. This must be under 10000 characters.
userYesThis field is used to describe the person that is speaking. The user is defined by two fields: userId and name. The name field is used to show the user name in the editor.

{ "userId": "ID", "name": "Name" }
alignNoThis field Controls the alignment and indentation of the text string in the editor. If left blank, we will default to left.

Options:
left
0-left-indent
1-left-indent
2-left-indent
3-left-indent
4-left-indent
5-left-indent
right
0-right-indent
1-right-indent
2-right-indent
3-right-indent
4-right-indent
5-right-indent

Model outputs object

ParameterRequiredDescription
titleYesUse this field to name the model response
contentYesThis field displays the content of the model output
modelConfigNameYesUse this field to specify the corresponding model config name

Sample Conversation JSON

πŸ“˜

Sample note

This sample contains modelOutputsfield which is unique to importing data rows for LLM human preference.

 {
  "type": "application/vnd.labelbox.conversational",
  "version": 1,
  "messages": [
    {
      "messageId": "message-0",
      "timestampUsec": 1530718491,
      "content": "Hi! How can I help?",
      "user": {
        "userId": "Bot 002",
        "name": "Bot"
      },
      "align": "left",
      "canLabel": false
    },
    {
      "messageId": "message-1",
      "timestampUsec": 1530718503,
      "content": "I just bought a vacuum cleaner from you guys like a week ago and it's already broken!!",
      "user": {
        "userId": "User 00686",
        "name": "User"
      },
      "align": "right",
      "canLabel": true
    }

  ],
  "modelOutputs": [
    {
      "title": "Response A",
      "content": "I'm really sorry to hear that your vacuum cleaner is not working as expected. We certainly understand how frustrating this can be, especially with a recent purchase. I assure you, we're here to help!\n\n To proceed with resolving this issue, could you please provide some details about the problem you're experiencing with the vacuum cleaner? Additionally, if you could share your order number or receipt details, that would be very helpful. We will definitely look into the available options for you, which may include a replacement or a refund, depending on the specific circumstances and our return policy.\n\n Your satisfaction is our priority, and we'll do our best to resolve this matter promptly and to your satisfaction.",
      "modelConfigName": "GPT-3.5 with temperature 0"
    },
    {
      "title": "Response B",
      "content": "I'm sorry about the issue with your vacuum cleaner. Please send us the order number or receipt details so we can quickly assist you with a replacement. Your satisfaction is important to us!",
      "modelConfigName": "Fine Tuned GPT-3.5 with demo data"
    }
  ]
}