Multimodal chat evaluation project

How to set up a multimodal chat evaluation project

Multimodal chat projects are set up differently than other Labelbox projects. They have unique methods and modifications to existing methods. This guide will showcase the differences and provide an example workflow.

Before you start

The below imports are needed to use the code examples in this section.

import labelbox as lb

API key and client

Please provide a valid API key below to connect to the Labelbox client properly. For more information, please review the Create API key guide.

API_KEY = None
client = lb.Client(api_key=API_KEY)

Create a multimodal chat ontology

You can create ontologies for multimodal chat projects in the same way as other project ontologies using two methods: client.create_ontology and client.create_ontology_from_feature_schemas. The only additional requirement is to pass an ontology_kind parameter, which needs to be set to lb.OntologyKind.ModelEvaluation.

Option A: create_ontology

Typically, you create ontologies and generate the associated features simultaneously. Below is an example of creating an ontology for your multimodal chat project using supported tools and classifications; for information on supported annotation types, visit our multimodal chat evaluation guide.

ontology_builder = lb.OntologyBuilder(
    tools=[
        lb.Tool(
            tool=lb.Tool.Type.MESSAGE_SINGLE_SELECTION,
            name="single select feature",
        ),
        lb.Tool(
            tool=lb.Tool.Type.MESSAGE_MULTI_SELECTION,
            name="multi select feature",
        ),
        lb.Tool(tool=lb.Tool.Type.MESSAGE_RANKING, name="ranking feature"),
    ],
    classifications=[
        lb.Classification(
            class_type=lb.Classification.Type.CHECKLIST,
            name="checklist feature",
            options=[
                lb.Option(value="option 1", label="option 1"),
                lb.Option(value="option 2", label="option 2"),
            ],
        ),
        lb.Classification(
            class_type=lb.Classification.Type.RADIO,
            name="radio_question",
            options=[
                lb.Option(value="first_radio_answer"),
                lb.Option(value="second_radio_answer"),
            ],
        ),
    ],
)

# Create ontology
ontology = client.create_ontology(
    "MCE ontology",
    ontology_builder.asdict(),
    media_type=lb.MediaType.Conversational,
    ontology_kind=lb.OntologyKind.ModelEvaluation,
)

Option B: create_ontology_from_feature_schemas

You can also create ontologies using feature schema IDs. This makes your ontologies come with existing features instead of generating new features. You can get these features by going to the Schema tab inside Labelbox.

ontology = client.create_ontology_from_feature_schemas(
    "MCE ontology",
    feature_schema_ids=["<list of feature schema ids"],
    media_type=lb.MediaType.Conversational,
    ontology_kind=lb.OntologyKind.ModelEvaluation,
)

Creating multimodal chat evaluation projects

There are two versions of a multimodal chat evaluation projects:

  1. Offline multimodal chat evaluation projects: Data rows will need to be imported manually and have no live model invocation.

  2. Live multimodal chat evaluation projects: Empty data rows are generated on project creation and are filled out with live model invocation.

We will discuss creating both types of projects with the Labelbox SDK.

Set up offline multimodal chat evaluation project

For an offline multimodal chat evaluation project, you must import conversational version 2 data rows. For more information, please visit our import multimodal chat evaluation data guide. Offline multimodal chat evaluation projects are created through the SDK with client.create_offline_model_evaluation_project. This method uses the same parameters as client.create_project but provides better validation to ensure the project is set up correctly.

project = client.create_offline_model_evaluation_project(
    name="<project_name>",
    description="<project_description>",  # optional
)

Set up live multimodal chat evaluation project

You do not have to create data rows with a live multimodal chat project; instead, they are generated when you create the project. The method you use to create your project is client.create_model_evaluation_project, which takes the same parameters as the traditional client.create_project but with a few specific additional parameters.

Parameters

When using client.create_model_evaluation_project the following parameters are needed:

  • create_model_evaluation_project parameters:

    • name: The name of your new project.

    • description: An optional description of your project.

    • media_type: The type of assets that this project will accept. This should be set to lb.MediaType.Conversational

    • dataset_name: The name of the dataset where the generated data rows will be located. Include this parameter only if you want to create a new dataset.

    • dataset_id: An optional dataset ID of an existing Labelbox dataset. Include this parameter if you want to append it to an existing dataset.

    • data_row_count: The number of data row assets that will be generated and used with your project.

project = client.create_model_evaluation_project(
    name="Demo MCE Project",
    media_type=lb.MediaType.Conversational,
    dataset_name="Demo MCE dataset",
    data_row_count=100,
)

# Setup project with ontology created above
project.connect_ontology(ontology)

Setting up model configs

You can create, delete, attach, and remove model configs from your live multimodal chat project through the SDK. These are the model configs that you will be evaluating for your responses.

Creating model config

The primary method for creating a model config is client.create_model_config. This method takes the following parameters:

  • name: Name of the model config.

  • model_id: The ID of the model to configure. You'll need to get this through the UI by navigating to the Model tab, selecting the model you are trying to use, and copying the ID inside the URL. For supported models, visit the multimodal chat evaluation page.

  • inference_params: JSON of model configuration parameters. This will vary depending on the model you are trying to set up. It is recommended to first set up a model config inside the UI to learn all the associated parameters.

For the example below, we will set up a Google Gemini 1.5 Pro model config.

MODEL_ID = "270a24ba-b983-40d6-9a1f-98a1bbc2fb65"

inference_params = {"max_new_tokens": 1024, "use_attachments": True}

model_config = client.create_model_config(
    name="Example Model Config",
    model_id=MODEL_ID,
    inference_params=inference_params,
)

Attaching model config to project

You can attach and remove model configs to your project using project.add_model_config or project.remove_model_config. Both methods take just a model_config ID.

project.add_model_config(model_config.uid)

Delete model config

You can also delete model configs using the client.delete_model_config. You just need to pass in the model_config ID to delete your model config. You can obtain this ID from your created model config above or get the model configs directly from your project using project.project_model_configs and then iterating through the list of model configs attached to your project. Uncomment the code below to delete your model configs.

# model_configs = project.project_model_configs()

# for model_config in model_configs:
#     client.delete_model_config(model_config.uid)

Mark project setup as completed

Once you have finalized your project and set up your model configs, you must mark the project setup as completed.

πŸ“˜

Information

Once the project is marked as "setup complete", a user can not add, modify, or delete existing project model configs.

project.set_project_model_setup_complete()

Exporting multimodal chat evaluation project

Exporting from a multimodal chat project works the same as exporting from other projects. In this example, your export will be empty unless you create labels inside the Labelbox platform. Please review our multimodal chat evaluation export guide for a sample export.

# The return type of this method is an `ExportTask`, which is a wrapper of a`Task`
# Most of `Task` features are also present in `ExportTask`.

export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
    "label_details": True,
    "performance_details": True,
    "interpolated_frames": True,
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters = {
    "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
    "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
    "workflow_status": "InReview",
    "batch_ids": ["batch_id_1", "batch_id_2"],
    "data_row_ids": ["data_row_id_1", "data_row_id_2"],
    "global_keys": ["global_key_1", "global_key_2"],
}

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:
def json_stream_handler(output: lb.BufferedJsonConverterOutput):
    print(output.json)


if export_task.has_errors():
    export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(
        stream_handler=lambda error: print(error)
    )

if export_task.has_result():
    export_json = export_task.get_buffered_stream(
        stream_type=lb.StreamType.RESULT
    ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
    "label_details": True,
    "performance_details": True,
}

# You can set the range for last_activity_at and label_created_at. You can also set a list of data
# row ids to export.
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.

# Note: Combinations of filters apply AND logic.
filters = {
    "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
    "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
    "workflow_status": "InReview",
    "batch_ids": ["batch_id_1", "batch_id_2"],
    "data_row_ids": ["data_row_id_1", "data_row_id_2"],
    "global_keys": ["global_key_1", "global_key_2"],
}

export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
    print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)