Import multimodal chat data

You need to set up a multimodal chat evaluation project before importing data. The two types of multimodal chat evaluation projects have different project creation methods and data row setups:

For offline multimodal chat evaluation projects, use create_offline_model_evaluation_project and import data rows of existing conversations.
For live multimodal chat evaluation projects, use client.create_model_evaluation_project and either:
- (Recommended) Create data rows and send them to projects, like other types of projects.
- Generate empty data rows upon project creation, which can’t create data rows with attachments and metadata.

For a full walk-through of setting up a multimodal chat evaluation project, see Multimodal chat evaluation.

Set up live multimodal chat evaluation projects

Use client.create_model_evaluation_project to create a live multimodal chat evaluation project. This method takes the same parameters as the traditional client.create_project, with a few additional parameters specific to multimodal chat evaluation projects. The client.create_model_evaluation_project methods require the following parameters:

name: The name of your new project.
description: An optional description of your project.
dataset_name (optional): The name of the dataset where the generated data rows will be located. Include this parameter only if you want to create a new dataset.
dataset_id (optional): The dataset ID of an existing Labelbox dataset. Include this parameter if you want to append it to an existing dataset.
data_row_count (optional): The number of data row assets that will be generated and used with your project. Defaults to 100 if a dataset_name or dataset_id is included.

Option A: Create and send data rows to projects

# Create the project
project = client.create_model_evaluation_project(
    name="Example live multimodal chat project",
    description="<project_description>",  # optional
)

def make_data_rows(dataset_id=None):
    # If a dataset ID is provided, fetch the dataset using that ID.
    # Otherwise, create a new dataset with the specified name.
    if dataset_id:
        dataset = client.get_dataset(dataset_id)
    else:
        dataset = client.create_dataset(name="example live mmc dataset")

    # Helper function to generate a single data row
    def generate_data(ind):
        return {
            "row_data": {  # The chat data format
                'type': 'application/vnd.labelbox.conversational.model-chat-evaluation',
                'draft': True,
                'rootMessageIds': [],
                'actors': {},
                'version': 2,
                'messages': {}
            },
            "global_key": f"global_key_{dataset.uid}_{ind}",
            "metadata_fields": [{"name": "tag", "value": "val_tag"}],
            "attachments": [
                {
                    "type": "IMAGE_OVERLAY",
                    "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg"
                }
            ]
        }

    # Generate a list of 100 data rows
    data_list = [generate_data(ind) for ind in range(100)]

    # Upload the generated data rows to the dataset
    task = dataset.create_data_rows(data_list)
    print("Processing task ", task.uid)  # Print the unique ID of the task
    task.wait_till_done()

    # Ensure that the task status is 'COMPLETE' to confirm success
    assert task.status == "COMPLETE"

    # Return the dataset object
    return dataset

# Create a new data set. Alternatively, pass an existing dataset ID
dataset = make_data_rows()

# Retrieve the data row IDs from the dataset
data_row_ids = [data_row.uid for data_row in dataset.data_rows()]

# Send data rows to the project
batch = project.create_batch(
    name="mmc-batch",  # each batch in a project must have a unique name
    data_rows=data_row_ids, # data row IDs to include in the batch
    priority=1  # priority between 1(highest) - 5(lowest)
)

print(f"Batch: {batch}")

Option B: Generate empty data rows

No metadata support

Only use this option if your project doesn’t require metadata attachments or embeddings for data rows.

# Create the project and generate data rows
project = client.create_model_evaluation_project(
    name="Example live multimodal chat project",
    description="<project_description>",  # optional
    dataset_name="Example live multimodal chat dataset",
    data_row_count=100,
)

# Connect the project to the created ontology
project.connect_ontology(ontology)

Set up offline multimodal chat evaluation projects

Use client.create_offline_model_evaluation_project to create offline multimodal chat evaluation projects. This method uses the same parameters as client.create_project and adds validation to ensure the project is set up correctly.

project = client.create_offline_model_evaluation_project(
    name="<project_name>",
    description="<project_description>",  # optional
)

After creating the project, you can import conversational version 2 data rows to the project. To learn how to import annotations, see Import multimodal chat annotations.

Specifications

File format: chat data JSON in conversation v2 format Import methods:

Local upload (maximum character count: 2,621,440)
IAM Delegated Access
Signed URLs (https URLs only)

When importing conversation or thread data to Labelbox, include the following information for each data row in your JSON file.

Parameter	Required	Description
`row_data`	Yes	`https` path to a cloud-hosted conversational text JSON file. See the section below for details on our conversation format.
`global_key`	No	Unique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if their global keys are duplicated to existing data rows.
`media_type`	No	`"CONVERSATIONAL"` (optional media type to provide better validation and error messaging)
`metadata_fields`	No	See metadata

Import format

{
  "row_data": {
    "type": "application/vnd.labelbox.conversational.model-chat-evaluation",
    "version": 2,
    "actors": {
        "cm1qu8krf00063b72cutnbn5l": {
        "role": "human",
        "metadata": { "name": "User" }
        },
        "cm1vjleif00023b6y4fw4ew94": {
        "role": "model",
        "metadata": {
            "modelConfigName": "Gem Pro-Copy"
        }
        },
        "cm1vjleif00033b6yifzroser": {
        "role": "model",
        "metadata": {
            "modelConfigName": "gpt 4-Copy"
        }
        }
    },
    "messages": {
        "cm1qu8krf00073b72fyar00vh": {
        "actorId": "cm1qu8krf00063b72cutnbn5l",
        "content": [{ "type": "text", "content": "Hello " }],
        "childMessageIds": [
            "cm1vjlitg00043b6y1tgssq1r",
            "cm1vjlitg00053b6y19ve1qra"
        ]
        },
        "cm1vjlitg00043b6y1tgssq1r": {
        "actorId": "cm1vjleif00023b6y4fw4ew94",
        "content": [
            {
            "type": "text",
            "content": "Hello! 👋 How can I assist you today? 😊 \\n"
            }
        ],
        "childMessageIds": []
        },
        "cm1vjlitg00053b6y19ve1qra": {
        "actorId": "cm1vjleif00033b6yifzroser",
        "content": [
            { "type": "text", "content": "Hi! How can I assist you today?" }
        ],
        "childMessageIds": []
        }
    },
    "rootMessageIds": ["cm1qu8krf00073b72fyar00vh"]
    },
  "global_key": "global_key"
}

Python example

# Embed the chat conversation data
row_data = {
    "type": "application/vnd.labelbox.conversational.model-chat-evaluation",
    "version": 2,
    "actors": {
        "cm1qu8krf00063b72cutnbn5l": {
            "role": "human",
            "metadata": { "name": "User" }
        },
        "cm1vjleif00023b6y4fw4ew94": {
            "role": "model",
            "metadata": {
                "modelConfigName": "Gem Pro-Copy"
            }
        },
        "cm1vjleif00033b6yifzroser": {
            "role": "model",
            "metadata": {
                "modelConfigName": "gpt 4-Copy"
            }
        }
    },
    "messages": {
        "cm1qu8krf00073b72fyar00vh": {
            "actorId": "cm1qu8krf00063b72cutnbn5l",
            "content": [{ "type": "text", "content": "Hello " }],
            "childMessageIds": [
                "cm1vjlitg00043b6y1tgssq1r",
                "cm1vjlitg00053b6y19ve1qra"
            ]
        },
        "cm1vjlitg00043b6y1tgssq1r": {
            "actorId": "cm1vjleif00023b6y4fw4ew94",
            "content": [
                {
                    "type": "text",
                    "content": "Hello! 👋 How can I assist you today? 😊 \\n"
                }
            ],
            "childMessageIds": []
        },
        "cm1vjlitg00053b6y19ve1qra": {
            "actorId": "cm1vjleif00033b6yifzroser",
            "content": [
                { "type": "text", "content": "Hi! How can I assist you today?" }
            ],
            "childMessageIds": []
        }
    },
    "rootMessageIds": ["cm1qu8krf00073b72fyar00vh"]
}

# Create a dataset
dataset = client.create_dataset(
    name="mmc_dataset",
)

# Upload the conversation data to the dataset as a data row.
task = dataset.create_data_rows([{"row_data": row_data}])
task.wait_till_done()

# Output any errors that occurred during the import.
print("Errors:", task.errors)
print("Failed data rows:", task.failed_data_rows)

Conversation v2 JSON

Parameter	Required	Description
`type`	Yes	Populate with `application/vnd.labelbox.conversational.model-chat-evaluation`
`version`	Yes	Populate with `2`
`actors`	Yes	An object of actors of the chat conversation.
`messages`	Yes	An object of messages from each actor.
`rootMessageIds`	Yes	An array of message ids. You would include the id of first message given from a human actor.

Actor object

Actor objects start with a key value of a unique user given id. Each actor object has a role key and a metadata key. The metadata contains the specifics of the actor and will vary depending on the actor’s role.

Parameter	Required	Description
`role`	Yes	The role the actor receives. Either `human` or `model`.
`name`	No	The name of the actor. This is applicable and required for actors with the human role. Placed inside the `metadata` actor key.
`modelConfigName`	Yes	The model config name of the actor. This is required for actors with the model role. Placed inside the `metadata` actor key.

Message object

Message objects start with a key value of a unique user given id.

Parameter	Required	Description
`actorId`	Yes	The id of the actor who produced the message.
`content`	Yes	An array of content for the message. See message content.
`childMessageIds`	No	An array of message ids that are children of the message object. Typically this would be the next series of messages. If you were comparing more then one model response, multiple message ids can be included.

Message content

Parameter	Required	Description
`type`	Yes	The type of message. This will be `fileData` for attachments, `text` for raw text, and `dataRowAttachment` for attachments on data rows.
`content`	No	The raw text content of your message. This field supports markdown. This field is used for `text` type messages.
`fileUri`	No	`https` path to a public cloud-hosted attachment file. This field is used for `fileData` type messages. If you want to use IAM delegated access to store conversation files, you should first add them as data row attachments. See attachments on how to add an attachment to a data row. After you add your attachments to your data row, you can use the `type` and `attachmentName` keys to include your attachment inside your conversational data.
`attachmentName`	No	The name of the attachment on the data row.
`mimeType`	No	The `mimeType` of your attachment `fileUri` data. The following types are supported: - `video/mp4` - `image/png` - `application/pdf`

Embed images

You can either embed images directly in the message content or add them as attachments.

// Message
{
  "actorId": "",
  "childMessageIds": [],
  "content": [
     {
        "type": "text",
        "content": f"<img title= {model_name2} alt={model_name2} src={model_url2}>"
     }
  ]
}

Sample conversation v2 JSON

SDK import only

You can’t upload the following file from the web interface directly. You must use an import file as described in Import format.

    "type": "application/vnd.labelbox.conversational.model-chat-evaluation",
    "version": 2,
    "actors": {
        "actor1": {
            "role": "human",
            "metadata": {
                "name": "User"
            }
        },
        "actor2": {
            "role": "model",
            "metadata": {
                "modelConfigName": "Model 1"
            }
        },
        "actor3": {
            "role": "model",
            "metadata": {
                "modelConfigName": "Model 2"
            }
        }
    },
    "messages": {
        "clxcboi1e00053p6n0ya733nn": {
            "actorId": "actor1",
            "content": [
                {
                    "type": "text",
                    "content": "What's in the images?"
                },
                {
                    "type": "fileData",
                    "fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/giraffe-combunetes.png",
                    "mimeType": "image/png"
                },
                {
                    "type": "fileData",
                    "fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/label-blocks.png",
                    "mimeType": "image/png"
                },
                {
                    "type": "fileData",
                    "fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/tim-test.png",
                    "mimeType": "image/png"
                }
            ],
            "childMessageIds": [
                "clxcboue900083p6no6emql83",
                "clxcboue900093p6nrepe8jjd"
            ]
        },
        "clxcboue900083p6no6emql83": {
            "actorId": "actor2",
            "content": [
                {
                    "type": "text",
                    "content": "The images show:\n 1. **A giraffe in an office:**  The giraffe is standing in front of a desk with a computer showing code on the screen. There's also a whiteboard with diagrams about Kubernetes."
                }
            ],
            "childMessageIds": [
                "clxmrt0hh00023p6qykkdaqtk"
            ]
        },
        "clxcboue900093p6nrepe8jjd": {
            "actorId": "actor3",
            "content": [
                {
                    "type": "text",
                    "content": "1. The first image shows a giraffe standing in an office environment, interacting with a computer. The giraffe is looking at a large monitor displaying code, and there is a whiteboard in the background with various diagrams and notes.\n2."
                }
            ],
            "childMessageIds": [
                "clxmrt0hh00023p6qykkdaqtk"
            ]
        },
        "clxmrt0hh00023p6qykkdaqtk": {
            "actorId": "actor1",
            "content": [
                {
                    "type": "text",
                    "content": "What's in this PDF file?"
                },
                {
                    "type": "fileData",
                    "fileUri": "https://storage.googleapis.com/labelbox-developer-testing-assets/pdf/data_files/south-platte-tree-map.pdf",
                    "mimeType": "application/pdf"
                }
            ],
            "childMessageIds": [
                "clxmrtgxg00033p6qqzl2596o",
                "clxmrtgxg00043p6qiehsvww4"
            ]
        },
        "clxmrtgxg00033p6qqzl2596o": {
            "actorId": "actor2",
            "content": [
                {
                    "type": "text",
                    "content": "This PDF file contains directions and maps for Christmas tree cutting areas managed by the South Platte Ranger District.  Here's a breakdown:\n**Page 1: Directions**\n* Provides written directions to three Christmas tree cutting areas:\n    **Buffalo Creek:** Located southeast of Pine Junction.\n    **Camp Fickes:** Located west of Buffalo Creek.\n    **Sugar Creek:** Located along County Road\n"
                }
            ],
            "childMessageIds": [
                "clxmru9j600053p6q0qh89zm4"
            ]
        },
        "clxmrtgxg00043p6qiehsvww4": {
            "actorId": "actor3",
            "content": [
                {
                    "type": "text",
                    "content": "Via fugit referre [duasque longumque](http://luminavale.com/) fateri sumite\ncalidumque arma spatiis fuerit genialiter errore iacent; cuncta hausit memori.\nAestus a omnia nomenque inlimis captantur ipsumque fuga. Aeneadae dona tenero\nclipei tamen, sed de amor flagellari quas; corpore, grande.\n[Pectore inclinatcadunt](http://tardoset.com/uni-et.html), Hectoreis defensatque virga altera\nsecum caliturasque militia pennas."
                }
            ],
            "childMessageIds": [
                "clxmru9j600053p6q0qh89zm4"
            ]
        },
        "clxmru9j600053p6q0qh89zm4": {
            "actorId": "actor1",
            "content": [
                {
                    "type": "text",
                    "content": "What have astronauts brought back from the moon?"
                }
            ],
            "childMessageIds": [
                "clxmrupyh00063p6q4wxj97sz",
                "clxmrupyh00073p6qeszn06l7"
            ]
        },
        "clxmrupyh00063p6q4wxj97sz": {
            "actorId": "actor2",
            "content": [
                {
                    "type": "text",
                    "content": "## Petebat semine\nDiurnis parsque, tanti nuper novi, extremae caede *Psophidaque spiro* dum visa.\nUsu dicebat obstet meritos."
                }
            ],
            "childMessageIds": []
        },
        "clxmrupyh00073p6qeszn06l7": {
            "actorId": "actor3",
            "content": [
                {
                    "type": "text",
                    "content": "## Ossa custos captabat insanis humus Cipe temptatum\nLorem markdownum adflatuque est Tydides medios. Notatas te Pandrose **solent**\npartes saucius animal certamen, plures opem corpora. Est magni duce, illiarcus: Iuno atque aderat amplexo genusque."
                }
            ],
            "childMessageIds": []
        }
    },
    "rootMessageIds": [
        "clxcboi1e00053p6n0ya733nn"
    ]

LaTeX support

To add LaTeX formatting, wrap your math expressions using backticks and dollar signs. The editor supports both inline and block LaTeX formatting. For example, to add LaTeX formatting for x=2, put $$x = 2$$.

Getting Started

Annotate

Model

Catalog

Schema

Import/Export

Integrations

Manage Team

Examples

Import multimodal chat data

Set up live multimodal chat evaluation projects

Option A: Create and send data rows to projects

Option B: Generate empty data rows

No metadata support

Set up offline multimodal chat evaluation projects

Specifications

Import format

Python example

Conversation v2 JSON

Actor object

Message object

Message content

Embed images

Sample conversation v2 JSON

SDK import only

LaTeX support

Getting Started

Annotate

Model

Catalog

Schema

Import/Export

Integrations

Manage Team

Examples

​Set up live multimodal chat evaluation projects

​Option A: Create and send data rows to projects

​Option B: Generate empty data rows

​No metadata support

​Set up offline multimodal chat evaluation projects

​Specifications

​Import format

​Python example

​Conversation v2 JSON

​Actor object

​Message object

​Message content

Embed images

​Sample conversation v2 JSON

​SDK import only

​LaTeX support

Set up live multimodal chat evaluation projects

Option A: Create and send data rows to projects

Option B: Generate empty data rows

No metadata support

Set up offline multimodal chat evaluation projects

Specifications

Import format

Python example

Conversation v2 JSON

Actor object

Message object

Message content

Sample conversation v2 JSON

SDK import only

LaTeX support