Multimodal chat evaluation

Learn how to set up a multimodal chat evaluation project using the SDK.

The multimodal chat evaluation editor allows you to evaluate generative model responses across multiple data types. This guide walks through setting up a multimodal chat evaluation project using the SDK.

Initialize the SDK

Import the Labelbox Python SDK:

import labelbox as lb

API key and client

Provide a valid API key to authenticate the Labelbox client. See Labelbox API keys to learn how to generate your key.

API_KEY = "" # Your API key client = lb.Client(api_key=API_KEY)

Create a multimodal chat ontology

You can create ontologies for multimodal chat projects using either client.create_ontology() or client.create_ontology_from_feature_schemas(), with the ontology_kind parameter set to lb.OntologyKind.ModelEvaluation. See Supported annotation types for the annotation types you can include in a multimodal chat evaluation ontology.

Option A: create_ontology

Use create_ontology to create an ontology and define its schema:

ontology_builder = lb.OntologyBuilder( tools=[ lb.Tool( tool=lb.Tool.Type.MESSAGE_SINGLE_SELECTION, name="single select feature", ), lb.Tool( tool=lb.Tool.Type.MESSAGE_MULTI_SELECTION, name="multi select feature", ), lb.Tool(tool=lb.Tool.Type.MESSAGE_RANKING, name="ranking feature"), StepReasoningTool(name="step reasoning"), FactCheckingTool(name="fact checking"), PromptIssueTool(name="prompt rating"), ], classifications=[ lb.Classification( class_type=lb.Classification.Type.CHECKLIST, name="checklist feature", options=[ lb.Option(value="option 1", label="option 1"), lb.Option(value="option 2", label="option 2"), ], ), lb.Classification( class_type=lb.Classification.Type.RADIO, name="radio_question", options=[ lb.Option(value="first_radio_answer"), lb.Option(value="second_radio_answer"), ], ), ], ) # Create ontology ontology = client.create_ontology( "Example ontology", ontology_builder.asdict(), media_type=lb.MediaType.Conversational, ontology_kind=lb.OntologyKind.ModelEvaluation, )

Option B: create_ontology_from_feature_schemas

Use create_ontology_from_feature_schemas with feature schema IDs to create ontologies that reuse existing feature schemas instead of defining new ones. To obtain these IDs, go to the Schema tab.

ontology = client.create_ontology_from_feature_schemas( "Example ontology", feature_schema_ids=["<list of feature schema ids"], media_type=lb.MediaType.Conversational, ontology_kind=lb.OntologyKind.ModelEvaluation, )

Create multimodal chat evaluation projects

The two types of multimodal chat evaluation projects have different project creation methods and data row setups:

  • For offline multimodal chat evaluation projects, use create_offline_model_evaluation_project and import data rows of existing conversations.

  • For live multimodal chat evaluation projects, use client.create_model_evaluation_project and either:

    • (Recommended) Create data rows and send them to projects, like other types of projects.
    • Generate empty data rows upon project creation, which can't create data rows with attachments and metadata.

Set up offline multimodal chat evaluation projects

Use client.create_offline_model_evaluation_project to create offline multimodal chat evaluation projects. This method takes the same parameters as client.create_project: a name and an optional description.

project = client.create_offline_model_evaluation_project( name="<project_name>", description="<project_description>", # optional )

After creating the project, import conversational version 2 data rows for further processing. For instructions, see Import multimodal chat evaluation data. To learn how to import annotations, see Import multimodal chat annotations.

Set up live multimodal chat evaluation projects

Use client.create_model_evaluation_project to create a live multimodal chat evaluation project. This method takes the same name and optional description parameters as client.create_project, with a few additional parameters specific to multimodal chat evaluation projects:

  • data_row_count (optional): The number of data rows to generate for your project. Defaults to 100 if a dataset_name or dataset_id is included.

  • dataset_name (optional): The name of a new dataset. Include this parameter only if you want to create a new dataset for the generated data rows.

  • dataset_id (optional): The dataset ID of an existing Labelbox dataset. Include this parameter only if you want to append generated data rows to an existing dataset.

Option A: Create and send data rows to projects

# Create the project project = client.create_model_evaluation_project( name="Example live multimodal chat project", description="<project_description>", # optional ) def make_data_rows(dataset_id=None): # If a dataset ID is provided, fetch the dataset using that ID. # Otherwise, create a new dataset with the specified name. if dataset_id: dataset = client.get_dataset(dataset_id) else: dataset = client.create_dataset(name="example live mmc dataset") # Helper function to generate a single data row def generate_data(ind): return { "row_data": { # The chat data format 'type': 'application/vnd.labelbox.conversational.model-chat-evaluation', 'draft': True, 'rootMessageIds': [], 'actors': {}, 'version': 2, 'messages': {} }, "global_key": f"global_key_{dataset.uid}_{ind}", "metadata_fields": [{"name": "tag", "value": "val_tag"}], "attachments": [ { "type": "IMAGE_OVERLAY", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg" } ] } # Generate a list of 100 data rows data_list = [generate_data(ind) for ind in range(100)] # Upload the generated data rows to the dataset task = dataset.create_data_rows(data_list) print("Processing task ", task.uid) # Print the unique ID of the task task.wait_till_done() # Ensure that the task status is 'COMPLETE' to confirm success assert task.status == "COMPLETE" # Return the dataset object return dataset # Create a new data set. Alternatively, pass an existing dataset ID dataset = make_data_rows() # Retrieve the data row IDs from the dataset data_row_ids = [data_row.uid for data_row in dataset.data_rows()] # Send data rows to the project batch = project.create_batch( name="mmc-batch", # each batch in a project must have a unique name data_rows=data_row_ids, # data row IDs to include in the batch priority=1 # priority between 1(highest) - 5(lowest) ) print(f"Batch: {batch}")

Option B: Generate empty data rows

📘

No metadata support

Only use this option if your project doesn't require metadata attachments or embeddings.

# Create the project and generate data rows project = client.create_model_evaluation_project( name="Example live multimodal chat project", description="<project_description>", # optional dataset_name="Example live multimodal chat dataset", data_row_count=100, ) # Connect the project to the created ontology project.connect_ontology(ontology)

Set up model configurations

You can create, delete, attach, and remove model configurations that evaluate your live multimodal chat responses.

Create model configs

Use client.create_model_config to create a model configuration. This method takes the following parameters:

  • name: The name of the model configuration.

  • model_id: The ID of the model to configure. To get this value, go to the Model tab, select your model, and copy the ID from the URL.

  • inference_params: Model configuration parameters in the JSON format. Each model has unique parameters.

📘

First-time setup

If you're setting up model configurations for the first time, it's recommended to set one up using the web platform instead of the SDK to understand all the associated parameters.

The following example creates a Google Gemini 1.5 Pro model configuration:

MODEL_ID = "270a24ba-b983-40d6-9a1f-98a1bbc2fb65" inference_params = {"max_new_tokens": 1024, "use_attachments": True} model_config = client.create_model_config( name="Example Model Config", model_id=MODEL_ID, inference_params=inference_params, )

Attach model configurations to projects

Use project.add_model_config to attach or project.remove_model_config to remove model configurations. Both methods take just a model_config ID.

project.add_model_config(model_config.uid)

Delete model configurations

Use project.delete_project_model_config() or client.delete_model_config to delete model configurations. Both methods require the model_config ID as a parameter. You can obtain this ID in one of the following ways:

  • From the response when you create a model configuration
  • By accessing project.project_model_configs and iterating through the list of model configurations attached to your project
model_configs = project.project_model_configs() for model_config in model_configs: project.delete_project_model_config(model_config.uid)
model_configs = project.project_model_configs() for model_config in model_configs: client.delete_model_config(model_config.uid)

Mark project setup as complete

Once you've completed your project setup and model configuration, use project.set_project_model_setup_complete() to mark the setup as complete. After that, you can add more data rows to the project, but you can't further add, modify, or delete project model configurations.

project.set_project_model_setup_complete()

Export multimodal chat evaluation projects

The export() method provides a unified way to export data from all project types, including multimodal chat evaluation projects. It retrieves model responses, annotations, and related metadata in a structured format. For detailed instructions on using export() to export multimodal chat evaluation projects and sample export formats, see Export multimodal chat annotations.