Learn how to set up a multimodal chat evaluation project using the SDK.
The multimodal chat evaluation editor allows you to evaluate generative model responses across multiple data types. This guide walks through setting up a multimodal chat evaluation project using the SDK.
Initialize the SDK
Import the Labelbox Python SDK:
import labelbox as lb
API key and client
Provide a valid API key to authenticate the Labelbox client. See Labelbox API keys to learn how to generate your key.
API_KEY = "" # Your API key
client = lb.Client(api_key=API_KEY)
Create a multimodal chat ontology
You can create ontologies for multimodal chat projects using either client.create_ontology()
or client.create_ontology_from_feature_schemas()
, with the ontology_kind
parameter set to lb.OntologyKind.ModelEvaluation
. See Supported annotation types for the annotation types you can include in a multimodal chat evaluation ontology.
Option A: create_ontology
create_ontology
Use create_ontology
to create an ontology and define its schema:
ontology_builder = lb.OntologyBuilder(
tools=[
lb.Tool(
tool=lb.Tool.Type.MESSAGE_SINGLE_SELECTION,
name="single select feature",
),
lb.Tool(
tool=lb.Tool.Type.MESSAGE_MULTI_SELECTION,
name="multi select feature",
),
lb.Tool(tool=lb.Tool.Type.MESSAGE_RANKING, name="ranking feature"),
StepReasoningTool(name="step reasoning"),
FactCheckingTool(name="fact checking"),
PromptIssueTool(name="prompt rating"),
],
classifications=[
lb.Classification(
class_type=lb.Classification.Type.CHECKLIST,
name="checklist feature",
options=[
lb.Option(value="option 1", label="option 1"),
lb.Option(value="option 2", label="option 2"),
],
),
lb.Classification(
class_type=lb.Classification.Type.RADIO,
name="radio_question",
options=[
lb.Option(value="first_radio_answer"),
lb.Option(value="second_radio_answer"),
],
),
],
)
# Create ontology
ontology = client.create_ontology(
"Example ontology",
ontology_builder.asdict(),
media_type=lb.MediaType.Conversational,
ontology_kind=lb.OntologyKind.ModelEvaluation,
)
Option B: create_ontology_from_feature_schemas
create_ontology_from_feature_schemas
Use create_ontology_from_feature_schemas
with feature schema IDs to create ontologies that reuse existing feature schemas instead of defining new ones. To obtain these IDs, go to the Schema tab.
ontology = client.create_ontology_from_feature_schemas(
"Example ontology",
feature_schema_ids=["<list of feature schema ids"],
media_type=lb.MediaType.Conversational,
ontology_kind=lb.OntologyKind.ModelEvaluation,
)
Create multimodal chat evaluation projects
The two types of multimodal chat evaluation projects have different project creation methods and data row setups:
-
For offline multimodal chat evaluation projects, use
create_offline_model_evaluation_project
and import data rows of existing conversations. -
For live multimodal chat evaluation projects, use
client.create_model_evaluation_project
and either:- (Recommended) Create data rows and send them to projects, like other types of projects.
- Generate empty data rows upon project creation, which can't create data rows with attachments and metadata.
Set up offline multimodal chat evaluation projects
Use client.create_offline_model_evaluation_project
to create offline multimodal chat evaluation projects. This method takes the same parameters as client.create_project
: a name
and an optional description
.
project = client.create_offline_model_evaluation_project(
name="<project_name>",
description="<project_description>", # optional
)
After creating the project, import conversational version 2 data rows for further processing. For instructions, see Import multimodal chat evaluation data. To learn how to import annotations, see Import multimodal chat annotations.
Set up live multimodal chat evaluation projects
Use client.create_model_evaluation_project
to create a live multimodal chat evaluation project. This method takes the same name
and optional description
parameters as client.create_project
, with a few additional parameters specific to multimodal chat evaluation projects:
-
data_row_count
(optional): The number of data rows to generate for your project. Defaults to 100 if adataset_name
ordataset_id
is included. -
dataset_name
(optional): The name of a new dataset. Include this parameter only if you want to create a new dataset for the generated data rows. -
dataset_id
(optional): The dataset ID of an existing Labelbox dataset. Include this parameter only if you want to append generated data rows to an existing dataset.
Option A: Create and send data rows to projects
# Create the project
project = client.create_model_evaluation_project(
name="Example live multimodal chat project",
description="<project_description>", # optional
)
def make_data_rows(dataset_id=None):
# If a dataset ID is provided, fetch the dataset using that ID.
# Otherwise, create a new dataset with the specified name.
if dataset_id:
dataset = client.get_dataset(dataset_id)
else:
dataset = client.create_dataset(name="example live mmc dataset")
# Helper function to generate a single data row
def generate_data(ind):
return {
"row_data": { # The chat data format
'type': 'application/vnd.labelbox.conversational.model-chat-evaluation',
'draft': True,
'rootMessageIds': [],
'actors': {},
'version': 2,
'messages': {}
},
"global_key": f"global_key_{dataset.uid}_{ind}",
"metadata_fields": [{"name": "tag", "value": "val_tag"}],
"attachments": [
{
"type": "IMAGE_OVERLAY",
"value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg"
}
]
}
# Generate a list of 100 data rows
data_list = [generate_data(ind) for ind in range(100)]
# Upload the generated data rows to the dataset
task = dataset.create_data_rows(data_list)
print("Processing task ", task.uid) # Print the unique ID of the task
task.wait_till_done()
# Ensure that the task status is 'COMPLETE' to confirm success
assert task.status == "COMPLETE"
# Return the dataset object
return dataset
# Create a new data set. Alternatively, pass an existing dataset ID
dataset = make_data_rows()
# Retrieve the data row IDs from the dataset
data_row_ids = [data_row.uid for data_row in dataset.data_rows()]
# Send data rows to the project
batch = project.create_batch(
name="mmc-batch", # each batch in a project must have a unique name
data_rows=data_row_ids, # data row IDs to include in the batch
priority=1 # priority between 1(highest) - 5(lowest)
)
print(f"Batch: {batch}")
Option B: Generate empty data rows
No metadata support
Only use this option if your project doesn't require metadata attachments or embeddings.
# Create the project and generate data rows
project = client.create_model_evaluation_project(
name="Example live multimodal chat project",
description="<project_description>", # optional
dataset_name="Example live multimodal chat dataset",
data_row_count=100,
)
# Connect the project to the created ontology
project.connect_ontology(ontology)
Set up model configurations
You can create, delete, attach, and remove model configurations that evaluate your live multimodal chat responses.
Create model configs
Use client.create_model_config
to create a model configuration. This method takes the following parameters:
-
name
: The name of the model configuration. -
model_id
: The ID of the model to configure. To get this value, go to the Model tab, select your model, and copy the ID from the URL. -
inference_params
: Model configuration parameters in the JSON format. Each model has unique parameters.
First-time setup
If you're setting up model configurations for the first time, it's recommended to set one up using the web platform instead of the SDK to understand all the associated parameters.
The following example creates a Google Gemini 1.5 Pro model configuration:
MODEL_ID = "270a24ba-b983-40d6-9a1f-98a1bbc2fb65"
inference_params = {"max_new_tokens": 1024, "use_attachments": True}
model_config = client.create_model_config(
name="Example Model Config",
model_id=MODEL_ID,
inference_params=inference_params,
)
Attach model configurations to projects
Use project.add_model_config to attach
or project.remove_model_config
to remove model configurations. Both methods take just a model_config
ID.
project.add_model_config(model_config.uid)
Delete model configurations
Use project.delete_project_model_config()
or client.delete_model_config
to delete model configurations. Both methods require the model_config
ID as a parameter. You can obtain this ID in one of the following ways:
- From the response when you create a model configuration
- By accessing
project.project_model_configs
and iterating through the list of model configurations attached to your project
model_configs = project.project_model_configs()
for model_config in model_configs:
project.delete_project_model_config(model_config.uid)
model_configs = project.project_model_configs()
for model_config in model_configs:
client.delete_model_config(model_config.uid)
Mark project setup as complete
Once you've completed your project setup and model configuration, use project.set_project_model_setup_complete()
to mark the setup as complete. After that, you can add more data rows to the project, but you can't further add, modify, or delete project model configurations.
project.set_project_model_setup_complete()
Export multimodal chat evaluation projects
The export()
method provides a unified way to export data from all project types, including multimodal chat evaluation projects. It retrieves model responses, annotations, and related metadata in a structured format. For detailed instructions on using export()
to export multimodal chat evaluation projects and sample export formats, see Export multimodal chat annotations.