Multimodal chat evaluation project

How to set up a multimodal chat evaluation project

Multimodal chat evaluation projects provide unique project creation methods compared to other media types. This guide showcases the differences and provides example workflows.

Before you start

Import the Labelbox Python SDK:

import labelbox as lb

API key and client

Provide a valid API key below to connect to the Labelbox client properly. For more information, see Create API key guide.

API_KEY = None
client = lb.Client(api_key=API_KEY)

Create a multimodal chat ontology

You can create ontologies for multimodal chat projects in the same way as other project ontologies using two methods: client.create_ontology and client.create_ontology_from_feature_schemas. The only additional requirement is to pass an ontology_kind parameter set to lb.OntologyKind.ModelEvaluation.

Option A: create_ontology

Typically, you create ontologies and generate the associated features simultaneously. Below is an example of creating an ontology for your multimodal chat project using supported tools and classifications; for information on supported annotation types, visit our multimodal chat evaluation guide.

ontology_builder = lb.OntologyBuilder(
    tools=[
        lb.Tool(
            tool=lb.Tool.Type.MESSAGE_SINGLE_SELECTION,
            name="single select feature",
        ),
        lb.Tool(
            tool=lb.Tool.Type.MESSAGE_MULTI_SELECTION,
            name="multi select feature",
        ),
        lb.Tool(tool=lb.Tool.Type.MESSAGE_RANKING, name="ranking feature"),
        StepReasoningTool(name="step reasoning"),
        FactCheckingTool(name="fact checking"),
        PromptIssueTool(name="prompt rating"),
    ],
    classifications=[
        lb.Classification(
            class_type=lb.Classification.Type.CHECKLIST,
            name="checklist feature",
            options=[
                lb.Option(value="option 1", label="option 1"),
                lb.Option(value="option 2", label="option 2"),
            ],
        ),
        lb.Classification(
            class_type=lb.Classification.Type.RADIO,
            name="radio_question",
            options=[
                lb.Option(value="first_radio_answer"),
                lb.Option(value="second_radio_answer"),
            ],
        ),
    ],
)

# Create ontology
ontology = client.create_ontology(
    "MMC ontology",
    ontology_builder.asdict(),
    media_type=lb.MediaType.Conversational,
    ontology_kind=lb.OntologyKind.ModelEvaluation,
)

Option B: create_ontology_from_feature_schemas

You can also create ontologies using feature schema IDs. This makes your ontologies come with existing features instead of generating new features. You can get these features by going to the Schema tab inside Labelbox.

ontology = client.create_ontology_from_feature_schemas(
    "MCE ontology",
    feature_schema_ids=["<list of feature schema ids"],
    media_type=lb.MediaType.Conversational,
    ontology_kind=lb.OntologyKind.ModelEvaluation,
)

Creating multimodal chat evaluation projects

The two types of multimodal chat evaluation projects have different project creation methods and data row setups:

  • For offline multimodal chat evaluation projects, use create_offline_model_evaluation_project and import data rows of existing conversations.

  • For live multimodal chat evaluation projects, use client.create_model_evaluation_project and either:

    • (Recommended) Create data rows and send them to projects, like other types of projects.
    • Generate empty data rows upon project creation, which can't create data rows with attachments and metadata.

Set up offline multimodal chat evaluation projects

Use client.create_offline_model_evaluation_project to create offline multimodal chat evaluation projects. This method uses the same parameters as client.create_project and adds validation to ensure the project is set up correctly.

project = client.create_offline_model_evaluation_project(
    name="<project_name>",
    description="<project_description>",  # optional
)

After creating the project, you need to import conversational version 2 data rows. For instructions, see import multimodal chat evaluation data. To learn how to import annotations, see Import multimodal chat annotations.

Set up live multimodal chat evaluation projects

Use client.create_model_evaluation_project to create a live multimodal chat evaluation project. This method takes the same parameters as the traditional client.create_project, with a few additional parameters specific to multimodal chat evaluation projects.

The client.create_model_evaluation_project methods require the following parameters:

  • name: The name of your new project.

  • description: An optional description of your project.

  • dataset_name (optional): The name of the dataset where the generated data rows will be located. Include this parameter only if you want to create a new dataset.

  • dataset_id (optional): The dataset ID of an existing Labelbox dataset. Include this parameter if you want to append it to an existing dataset.

  • data_row_count (optional): The number of data row assets that will be generated and used with your project. Defaults to 100 if a dataset_name or dataset_id is included.

Option A: Create and send data rows to projects

# Create the project
project = client.create_model_evaluation_project(
    name="Example live multimodal chat project",
    description="<project_description>",  # optional
)

def make_data_rows(dataset_id=None):
    # If a dataset ID is provided, fetch the dataset using that ID.
    # Otherwise, create a new dataset with the specified name.
    if dataset_id:
        dataset = client.get_dataset(dataset_id)
    else:
        dataset = client.create_dataset(name="example live mmc dataset")

    # Helper function to generate a single data row
    def generate_data(ind):
        return {
            "row_data": {  # The chat data format
                'type': 'application/vnd.labelbox.conversational.model-chat-evaluation',
                'draft': True,
                'rootMessageIds': [],
                'actors': {},
                'version': 2,
                'messages': {}
            },
            "global_key": f"global_key_{dataset.uid}_{ind}",
            "metadata_fields": [{"name": "tag", "value": "val_tag"}],
            "attachments": [
                {
                    "type": "IMAGE_OVERLAY",
                    "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/rgb.jpg"
                }
            ]
        }

    # Generate a list of 100 data rows
    data_list = [generate_data(ind) for ind in range(100)]

    # Upload the generated data rows to the dataset
    task = dataset.create_data_rows(data_list)
    print("Processing task ", task.uid)  # Print the unique ID of the task
    task.wait_till_done()
    
    # Ensure that the task status is 'COMPLETE' to confirm success
    assert task.status == "COMPLETE"

    # Return the dataset object
    return dataset

# Create a new data set. Alternatively, pass an existing dataset ID
dataset = make_data_rows()

# Retrieve the data row IDs from the dataset
data_row_ids = [data_row.uid for data_row in dataset.data_rows()]

# Send data rows to the project
batch = project.create_batch(
    name="mmc-batch",  # each batch in a project must have a unique name
    data_rows=data_row_ids, # data row IDs to include in the batch
    priority=1  # priority between 1(highest) - 5(lowest)
)

print(f"Batch: {batch}")

Option B: Generate empty data rows

📘

No metadata support

Only use this option if your project doesn't require metadata attachments or embeddings for data rows.

# Create the project and generate data rows
project = client.create_model_evaluation_project(
    name="Example live multimodal chat project",
    description="<project_description>",  # optional
    dataset_name="Example live multimodal chat dataset",
    data_row_count=100,
)

# Connect the project to the created ontology
project.connect_ontology(ontology)

Setting up model configs

You can create, delete, attach, and remove model configs from your live multimodal chat project through the SDK. These are the model configs that you will be evaluating for your responses.

Creating model config

The primary method for creating a model config is client.create_model_config. This method takes the following parameters:

  • name: Name of the model config.

  • model_id: The ID of the model to configure. You'll need to get this through the UI by navigating to the Model tab, selecting the model you are trying to use, and copying the ID inside the URL. For supported models, visit the multimodal chat evaluation page.

  • inference_params: JSON of model configuration parameters. This will vary depending on the model you are trying to set up. It is recommended to first set up a model config inside the UI to learn all the associated parameters.

For the example below, we will set up a Google Gemini 1.5 Pro model config.

MODEL_ID = "270a24ba-b983-40d6-9a1f-98a1bbc2fb65"

inference_params = {"max_new_tokens": 1024, "use_attachments": True}

model_config = client.create_model_config(
    name="Example Model Config",
    model_id=MODEL_ID,
    inference_params=inference_params,
)

Attaching model config to project

You can attach and remove model configs to your project using project.add_model_config or project.remove_model_config. Both methods take just a model_config ID.

project.add_model_config(model_config.uid)

Delete model config

Use project.delete_project_model_config() or client.delete_model_config to delete model configs. Both methods require the model_config ID as a parameter. You can obtain this ID from your created model config, or retrieve the model configs directly from your project using project.project_model_configs, and then iterate through the list of model configs attached to your project.

model_configs = project.project_model_configs()

for model_config in model_configs:
    project.delete_project_model_config(model_config.uid)
model_configs = project.project_model_configs()

for model_config in model_configs:
    client.delete_model_config(model_config.uid)

Mark project setup as completed

Once you have finalized your project and set up your model configs, you must mark the project setup as completed.

📘

Information

Once the project is marked as "setup complete", a user can not add, modify, or delete existing project model configs.

project.set_project_model_setup_complete()

Exporting multimodal chat evaluation project

Exporting from a multimodal chat project works the same as exporting from other projects. In this example, your export will be empty unless you create labels inside the Labelbox platform. Please review our multimodal chat evaluation export guide for a sample export.

# The return type of this method is an `ExportTask`, which is a wrapper of a`Task`
# Most of `Task` features are also present in `ExportTask`.

export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
    "label_details": True,
    "performance_details": True,
    "interpolated_frames": True,
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters = {
    "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
    "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
    "workflow_status": "InReview",
    "batch_ids": ["batch_id_1", "batch_id_2"],
    "data_row_ids": ["data_row_id_1", "data_row_id_2"],
    "global_keys": ["global_key_1", "global_key_2"],
}

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Stream results and errors
if export_task.has_errors():
    export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(
        stream_handler=lambda error: print(error))

if export_task.has_result():
    # Start export stream
    stream = export_task.get_buffered_stream()

    # Iterate through data rows
    for data_row in stream:
        print(data_row.json)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))