Offline multimodal chat evaluation

The offline multimodal chat evaluation editor allows you to evaluate generative models by importing existing conversations and adding annotations to model responses. The editor supports various data types, including text, images, videos, audio, and PDFs.

Set up offline multimodal chat evaluation projects

The following steps walk you through how to set up an offline multimodal chat evaluation project on the Labelbox platform. To learn how to set up an offline multimodal chat evaluation project using the SDK, see Multimodal chat evaluation projects.

Step 1: create a project

On the Annotate projects page, click the + New project button.
Select Multimodal chat, and then select Offline multimodal chat.
Provide a name and an optional description for your project.

Step 2: add data

Click the Add data button to select a conversation v2 JSON dataset or create a new dataset. Alternatively, you can import data using the SDK.

Step 3: Set up an ontology

Create an ontology for evaluating model response, like the following example:

The editor supports the following options:

Feature	Description	Export format
Message ranking	Rank multiple model-generated responses to determine their relative quality or relevance.	Payload
Message selection	Select single or multiple responses that meet specific criteria.	Payload
Message step reasoning	Break responses into steps and evaluate the accuracy of each step by selecting from correct, neutral, and incorrect. Add your rewrite with justification for incorrect steps.	Payload
Classification - Radio	Select one option from a predefined set.	Payload
Classification - Checklist	Choose multiple options from a list.	Payload
Classification - Free text	Add free text annotations.	Payload

Classification tasks can apply globally to the entire conversation or individually to a message. They can also nest subclassification tasks.

Experimental feature

Message step reasoning is an experimental feature. Currently, you can’t import step reasoning labels using the SDK.

Step 4: Complete annotation tasks

Click the Start labeling button to add annotations to evaluate the responses. Complete all tasks in your workflow.

​Set up offline multimodal chat evaluation projects

​Step 1: create a project

​Step 2: add data

​Step 3: Set up an ontology

​Experimental feature

​Step 4: Complete annotation tasks

Set up offline multimodal chat evaluation projects

Step 1: create a project

Step 2: add data

Step 3: Set up an ontology

Experimental feature

Step 4: Complete annotation tasks