How to import multimodal chat data and sample import formats.
You need to set up a multimodal chat evaluation project before importing data. The two types of multimodal chat evaluation projects have different project creation methods and data row setups:
-
For offline multimodal chat evaluation projects, use
create_offline_model_evaluation_project
and import data rows of existing conversations. -
For live multimodal chat evaluation projects, use
client.create_model_evaluation_project
and either:- (Recommended) Create data rows and send them to projects, like other types of projects.
- Generate empty data rows upon project creation, which can't create data rows with attachments and metadata.
For a full walk-through of setting up a multimodal chat evaluation project, see Multimodal chat evaluation.
Set up live multimodal chat evaluation projects
Use client.create_model_evaluation_project
to create a live multimodal chat evaluation project. This method takes the same parameters as the traditional client.create_project
, with a few additional parameters specific to multimodal chat evaluation projects.
The client.create_model_evaluation_project
methods require the following parameters:
-
name
: The name of your new project. -
description
: An optional description of your project. -
dataset_name
(optional): The name of the dataset where the generated data rows will be located. Include this parameter only if you want to create a new dataset. -
dataset_id
(optional): The dataset ID of an existing Labelbox dataset. Include this parameter if you want to append it to an existing dataset. -
data_row_count
(optional): The number of data row assets that will be generated and used with your project. Defaults to 100 if adataset_name
ordataset_id
is included.
Option A: Create and send data rows to projects
# Create the project
project = client.create_model_evaluation_project(
name="Example live multimodal chat project",
description="<project_description>", # optional
)
def make_data_rows(dataset_id=None):
# If a dataset ID is provided, fetch the dataset using that ID.
# Otherwise, create a new dataset with the specified name.
if dataset_id:
dataset = client.get_dataset(dataset_id)
else:
dataset = client.create_dataset(name="example live mmc dataset")
# Helper function to generate a single data row
def generate_data(ind):
return {
"row_data": { # The chat data format
'type': 'application/vnd.labelbox.conversational.model-chat-evaluation',
'draft': True,
'rootMessageIds': [],
'actors': {},
'version': 2,
'messages': {}
},
"global_key": f"global_key_{dataset.uid}_{ind}",
"metadata_fields": [{"name": "tag", "value": "val_tag"}],
"attachments": [
{
"type": "IMAGE_OVERLAY",
"value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg"
}
]
}
# Generate a list of 100 data rows
data_list = [generate_data(ind) for ind in range(100)]
# Upload the generated data rows to the dataset
task = dataset.create_data_rows(data_list)
print("Processing task ", task.uid) # Print the unique ID of the task
task.wait_till_done()
# Ensure that the task status is 'COMPLETE' to confirm success
assert task.status == "COMPLETE"
# Return the dataset object
return dataset
# Create a new data set. Alternatively, pass an existing dataset ID
dataset = make_data_rows()
# Retrieve the data row IDs from the dataset
data_row_ids = [data_row.uid for data_row in dataset.data_rows()]
# Send data rows to the project
batch = project.create_batch(
name="mmc-batch", # each batch in a project must have a unique name
data_rows=data_row_ids, # data row IDs to include in the batch
priority=1 # priority between 1(highest) - 5(lowest)
)
print(f"Batch: {batch}")
Option B: Generate empty data rows
No metadata support
Only use this option if your project doesn't require metadata attachments or embeddings for data rows.
# Create the project and generate data rows
project = client.create_model_evaluation_project(
name="Example live multimodal chat project",
description="<project_description>", # optional
dataset_name="Example live multimodal chat dataset",
data_row_count=100,
)
# Connect the project to the created ontology
project.connect_ontology(ontology)
Set up offline multimodal chat evaluation projects
Use client.create_offline_model_evaluation_project
to create offline multimodal chat evaluation projects. This method uses the same parameters as client.create_project
and adds validation to ensure the project is set up correctly.
project = client.create_offline_model_evaluation_project(
name="<project_name>",
description="<project_description>", # optional
)
After creating the project, you can import conversational version 2 data rows to the project. To learn how to import annotations, see Import multimodal chat annotations.
Specifications
File format: chat data JSON in conversation v2 format
Import methods:
- Local upload (maximum character count: 2,621,440)
- IAM Delegated Access
- Signed URLs (
https
URLs only)
When importing conversation or thread data to Labelbox, include the following information for each data row in your JSON file.
Parameter | Required | Description |
---|---|---|
row_data | Yes | https path to a cloud-hosted conversational text JSON file. See the section below for details on our conversation format. |
global_key | No | Unique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if their global keys are duplicated to existing data rows. |
media_type | No | "CONVERSATIONAL" (optional media type to provide better validation and error messaging) |
metadata_fields | No | See metadata |
Import format
{
"row_data": {
"type": "application/vnd.labelbox.conversational.model-chat-evaluation",
"version": 2,
"actors": {
"cm1qu8krf00063b72cutnbn5l": {
"role": "human",
"metadata": { "name": "User" }
},
"cm1vjleif00023b6y4fw4ew94": {
"role": "model",
"metadata": {
"modelConfigName": "Gem Pro-Copy"
}
},
"cm1vjleif00033b6yifzroser": {
"role": "model",
"metadata": {
"modelConfigName": "gpt 4-Copy"
}
}
},
"messages": {
"cm1qu8krf00073b72fyar00vh": {
"actorId": "cm1qu8krf00063b72cutnbn5l",
"content": [{ "type": "text", "content": "Hello " }],
"childMessageIds": [
"cm1vjlitg00043b6y1tgssq1r",
"cm1vjlitg00053b6y19ve1qra"
]
},
"cm1vjlitg00043b6y1tgssq1r": {
"actorId": "cm1vjleif00023b6y4fw4ew94",
"content": [
{
"type": "text",
"content": "Hello! 👋 How can I assist you today? 😊 \\n"
}
],
"childMessageIds": []
},
"cm1vjlitg00053b6y19ve1qra": {
"actorId": "cm1vjleif00033b6yifzroser",
"content": [
{ "type": "text", "content": "Hi! How can I assist you today?" }
],
"childMessageIds": []
}
},
"rootMessageIds": ["cm1qu8krf00073b72fyar00vh"]
},
"global_key": "global_key"
}
[
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_1.json",
"global_key": "global_key_1"
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_2.json",
"global_key": "global_key_2"
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_3.json",
"global_key": "global_key_3"
}
]
Python example
# Embed the chat conversation data
row_data = {
"type": "application/vnd.labelbox.conversational.model-chat-evaluation",
"version": 2,
"actors": {
"cm1qu8krf00063b72cutnbn5l": {
"role": "human",
"metadata": { "name": "User" }
},
"cm1vjleif00023b6y4fw4ew94": {
"role": "model",
"metadata": {
"modelConfigName": "Gem Pro-Copy"
}
},
"cm1vjleif00033b6yifzroser": {
"role": "model",
"metadata": {
"modelConfigName": "gpt 4-Copy"
}
}
},
"messages": {
"cm1qu8krf00073b72fyar00vh": {
"actorId": "cm1qu8krf00063b72cutnbn5l",
"content": [{ "type": "text", "content": "Hello " }],
"childMessageIds": [
"cm1vjlitg00043b6y1tgssq1r",
"cm1vjlitg00053b6y19ve1qra"
]
},
"cm1vjlitg00043b6y1tgssq1r": {
"actorId": "cm1vjleif00023b6y4fw4ew94",
"content": [
{
"type": "text",
"content": "Hello! 👋 How can I assist you today? 😊 \\n"
}
],
"childMessageIds": []
},
"cm1vjlitg00053b6y19ve1qra": {
"actorId": "cm1vjleif00033b6yifzroser",
"content": [
{ "type": "text", "content": "Hi! How can I assist you today?" }
],
"childMessageIds": []
}
},
"rootMessageIds": ["cm1qu8krf00073b72fyar00vh"]
}
# Create a dataset
dataset = client.create_dataset(
name="mmc_dataset",
)
# Upload the conversation data to the dataset as a data row.
task = dataset.create_data_rows([row_data])
task.wait_till_done()
# Output any errors that occurred during the import.
print("Errors:", task.errors)
print("Failed data rows:", task.failed_data_rows)
# Generate dummy global keys
global_key_1 = str(uuid.uuid4())
global_key_2 = str(uuid.uuid4())
global_key_3 = str(uuid.uuid4())
# Create a dataset
dataset = client.create_dataset(
name="pairwise_demo_"+str(uuid.uuid4()),
iam_integration=None
)
# Upload data rows
task = dataset.create_data_rows([
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_1.json",
"global_key": global_key_1
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_2.json",
"global_key": global_key_2
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_3.json",
"global_key": global_key_3
}
])
task.wait_till_done()
print("Errors:",task.errors)
print("Failed data rows:", task.failed_data_rows)
Conversation v2 JSON
Parameter | Required | Description |
---|---|---|
type | Yes | Populate with application/vnd.labelbox.conversational.model-chat-evaluation |
version | Yes | Populate with 2 |
actors | Yes | An object of actors of the chat conversation. |
messages | Yes | An object of messages from each actor. |
rootMessageIds | Yes | An array of message ids. You would include the id of first message given from a human actor. |
Actor object
Actor objects start with a key value of a unique user given id.
Each actor object has a role
key and a metadata
key. The metadata contains the specifics of the actor and will vary depending on the actor's role.
Parameter | Required | Description |
---|---|---|
role | Yes | The role the actor receives. Either human or model . |
name | No | The name of the actor. This is applicable and required for actors with the human role. Placed inside the metadata actor key. |
modelConfigName | Yes | The model config name of the actor. This is required for actors with the model role. Placed inside the metadata actor key. |
Message object
Message objects start with a key value of a unique user given id.
Parameter | Required | Description |
---|---|---|
actorId | Yes | The id of the actor who produced the message. |
content | Yes | An array of content for the message. See message content. |
childMessageIds | No | An array of message ids that are children of the message object. Typically this would be the next series of messages. If you were comparing more then one model response, multiple message ids can be included. |
Message content
Parameter | Required | Description |
---|---|---|
type | Yes | The type of message. This will be fileData for attachments, text for raw text, and dataRowAttachment for attachments on data rows. |
content | No | The raw text content of your message. This field supports markdown. This field is used for text type messages. |
fileUri | No | https path to a public cloud-hosted attachment file. This field is used for fileData type messages. If you want to use IAM delegated access to store conversation files, you should first add them as data row attachments. See attachments on how to add an attachment to a data row. After you add your attachments to your data row, you can use the type and attachmentName keys to include your attachment inside your conversational data. |
attachmentName | No | The name of the attachment on the data row. |
mimeType | No | The mimeType of your attachment fileUri data. The following types are supported:- video/mp4 - image/png - application/pdf |
Embed images
You can either embed images directly in the message content
or add them as attachments.
// Message
{
"actorId": "",
"childMessageIds": [],
"content": [
{
"type": "text",
"content": f"<img title= {model_name2} alt={model_name2} src={model_url2}>"
}
]
}
// Message
{
"actorId": "",
"childMessageIds": [],
"content": [
{
"type": "text",
"content": "What do you see in this image?"
},
{
"type": "fileData",
"fileUri": "https://link-to-my-image",
"mimeType": "image/png"
}
]
}
Sample conversation v2 JSON
SDK import only
You can't upload the following file from the web interface directly. You must use an import file as described in Import format.
"type": "application/vnd.labelbox.conversational.model-chat-evaluation",
"version": 2,
"actors": {
"actor1": {
"role": "human",
"metadata": {
"name": "User"
}
},
"actor2": {
"role": "model",
"metadata": {
"modelConfigName": "Model 1"
}
},
"actor3": {
"role": "model",
"metadata": {
"modelConfigName": "Model 2"
}
}
},
"messages": {
"clxcboi1e00053p6n0ya733nn": {
"actorId": "actor1",
"content": [
{
"type": "text",
"content": "What's in the images?"
},
{
"type": "fileData",
"fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/giraffe-combunetes.png",
"mimeType": "image/png"
},
{
"type": "fileData",
"fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/label-blocks.png",
"mimeType": "image/png"
},
{
"type": "fileData",
"fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/tim-test.png",
"mimeType": "image/png"
}
],
"childMessageIds": [
"clxcboue900083p6no6emql83",
"clxcboue900093p6nrepe8jjd"
]
},
"clxcboue900083p6no6emql83": {
"actorId": "actor2",
"content": [
{
"type": "text",
"content": "The images show:\n 1. **A giraffe in an office:** The giraffe is standing in front of a desk with a computer showing code on the screen. There's also a whiteboard with diagrams about Kubernetes."
}
],
"childMessageIds": [
"clxmrt0hh00023p6qykkdaqtk"
]
},
"clxcboue900093p6nrepe8jjd": {
"actorId": "actor3",
"content": [
{
"type": "text",
"content": "1. The first image shows a giraffe standing in an office environment, interacting with a computer. The giraffe is looking at a large monitor displaying code, and there is a whiteboard in the background with various diagrams and notes.\n2."
}
],
"childMessageIds": [
"clxmrt0hh00023p6qykkdaqtk"
]
},
"clxmrt0hh00023p6qykkdaqtk": {
"actorId": "actor1",
"content": [
{
"type": "text",
"content": "What's in this PDF file?"
},
{
"type": "fileData",
"fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/pdf/data_files/south-platte-tree-map.pdf",
"mimeType": "application/pdf"
}
],
"childMessageIds": [
"clxmrtgxg00033p6qqzl2596o",
"clxmrtgxg00043p6qiehsvww4"
]
},
"clxmrtgxg00033p6qqzl2596o": {
"actorId": "actor2",
"content": [
{
"type": "text",
"content": "This PDF file contains directions and maps for Christmas tree cutting areas managed by the South Platte Ranger District. Here's a breakdown:\n**Page 1: Directions**\n* Provides written directions to three Christmas tree cutting areas:\n **Buffalo Creek:** Located southeast of Pine Junction.\n **Camp Fickes:** Located west of Buffalo Creek.\n **Sugar Creek:** Located along County Road\n"
}
],
"childMessageIds": [
"clxmru9j600053p6q0qh89zm4"
]
},
"clxmrtgxg00043p6qiehsvww4": {
"actorId": "actor3",
"content": [
{
"type": "text",
"content": "Via fugit referre [duasque longumque](http://luminavale.com/) fateri sumite\ncalidumque arma spatiis fuerit genialiter errore iacent; cuncta hausit memori.\nAestus a omnia nomenque inlimis captantur ipsumque fuga. Aeneadae dona tenero\nclipei tamen, sed de amor flagellari quas; corpore, grande.\n[Pectore inclinatcadunt](http://tardoset.com/uni-et.html), Hectoreis defensatque virga altera\nsecum caliturasque militia pennas."
}
],
"childMessageIds": [
"clxmru9j600053p6q0qh89zm4"
]
},
"clxmru9j600053p6q0qh89zm4": {
"actorId": "actor1",
"content": [
{
"type": "text",
"content": "What have astronauts brought back from the moon?"
}
],
"childMessageIds": [
"clxmrupyh00063p6q4wxj97sz",
"clxmrupyh00073p6qeszn06l7"
]
},
"clxmrupyh00063p6q4wxj97sz": {
"actorId": "actor2",
"content": [
{
"type": "text",
"content": "## Petebat semine\nDiurnis parsque, tanti nuper novi, extremae caede *Psophidaque spiro* dum visa.\nUsu dicebat obstet meritos."
}
],
"childMessageIds": []
},
"clxmrupyh00073p6qeszn06l7": {
"actorId": "actor3",
"content": [
{
"type": "text",
"content": "## Ossa custos captabat insanis humus Cipe temptatum\nLorem markdownum adflatuque est Tydides medios. Notatas te Pandrose **solent**\npartes saucius animal certamen, plures opem corpora. Est magni duce, illiarcus: Iuno atque aderat amplexo genusque."
}
],
"childMessageIds": []
}
},
"rootMessageIds": [
"clxcboi1e00053p6n0ya733nn"
]
LaTeX support
To add LaTeX formatting, wrap your math expressions using backticks and dollar signs. The editor supports both inline and block LaTeX formatting. For example, to add LaTeX formatting for
x=2
, put$$x = 2$$
.