Live multimodal chat evaluation project

How to set up a Live multimodal chat evaluation project

Live multimodal chat projects are set up differently than other Labelbox projects, with their own unique methods and modifications to existing methods. This guide will showcase the differences and provide an example workflow.

Before you start

The below imports are needed to use the code examples in this section.

import labelbox as lb

API key and client

Provide a valid API key below to properly connect to the Labelbox client. Please review Create API key guide for more information.

API_KEY = None
client = lb.Client(api_key=API_KEY)

Create a Live multimodal chat ontology

You can create ontologies for Live multimodal chat projects the same way as creating ontologies for other projects with the only requirement of passing a ontology_kind parameter which needs set to lb.OntologyKind.ModelEvaluation. You can create ontologies with two methods: client.create_ontology and client.create_ontology_from_feature_schemas.

Option A: create_ontology

Typically, you create ontologies and generate the associated features simultaneously. Below is an example of creating an ontology for your Live multimodal chat project using supported tools and classifications. For information on supported annotation types visit our Live multimodal chat evaluation guide.

ontology_builder = lb.OntologyBuilder(
    tools=[
        lb.Tool(
            tool=lb.Tool.Type.MESSAGE_SINGLE_SELECTION,
            name="single select feature",
        ),
        lb.Tool(
            tool=lb.Tool.Type.MESSAGE_MULTI_SELECTION,
            name="multi select feature",
        ),
        lb.Tool(tool=lb.Tool.Type.MESSAGE_RANKING, name="ranking feature"),
    ],
    classifications=[
        lb.Classification(
            class_type=lb.Classification.Type.CHECKLIST,
            name="checklist feature",
            options=[
                lb.Option(value="option 1", label="option 1"),
                lb.Option(value="option 2", label="option 2"),
            ],
        ),
        lb.Classification(
            class_type=lb.Classification.Type.RADIO,
            name="radio_question",
            options=[
                lb.Option(value="first_radio_answer"),
                lb.Option(value="second_radio_answer"),
            ],
        ),
    ],
)

# Create ontology
ontology = client.create_ontology(
    "MCE ontology",
    ontology_builder.asdict(),
    media_type=lb.MediaType.Conversational,
    ontology_kind=lb.OntologyKind.ModelEvaluation,
)

Option B: create_ontology_from_feature_schemas

Ontologies can also be created with feature schema IDs. This makes your ontologies with existing features compared to generating new features. You can get these features by going to the Schema tab inside Labelbox.

ontology = client.create_ontology_from_feature_schemas(
    "MCE ontology",
    feature_schema_ids=["<list of feature schema ids"],
    media_type=lb.MediaType.Conversational,
    ontology_kind=lb.OntologyKind.ModelEvaluation,
)

Set up Live multimodal chat evaluation project

You do not have to create data rows with a Live multimodal chat project; instead, they are generated for you when you create the project. The method you use to create your project is client.create_model_evaluation_project, which takes the same parameters as the traditional client.create_project but with a few specific additional parameters.

Parameters

When using client.create_model_evaluation_project the following parameters are needed:

  • create_model_evaluation_project parameters:

    • name: The name of your new project.

    • description: An optional description of your project.

    • media_type: The type of assets that this project will accept. This should be set to lb.MediaType.Conversational

    • dataset_name: The name of the dataset where the generated data rows will be located. Include this parameter only if you want to create a new dataset.

    • dataset_id: An optional dataset ID of an existing Labelbox dataset. Include this parameter if you want to append it to an existing dataset.

    • data_row_count: The number of data row assets that will be generated and used with your project.

project = client.create_model_evaluation_project(
    name="Demo MCE Project",
    media_type=lb.MediaType.Conversational,
    dataset_name="Demo MCE dataset",
    data_row_count=100,
)

# Setup project with ontology created above
project.setup_editor(ontology)

Setting up model configs

You can create, delete, attach, and remove model configs from your Live multimodal chat project through the SDK. These are the model configs that you will be evaluating for your responses.

Creating model config

The main method associated with creating a model config is client.create_model_config. This method takes the following parameters:

  • name: Name of the model config.

  • model_id: The ID of the model to configure. You must obtain this through the UI by navigating to the Model tab, selecting the model you are trying to use, and copying the ID inside the URL. For supported models, visit the Live multimodal chat evaluation page.

  • inference_params: JSON of model configuration parameters. This will vary depending on the model you are trying to set up. It is recommended to first set up a model config inside the UI to learn all the associated parameters.

For the example below, we will be setting up a Google Gemini 1.5 Pro model config.

MODEL_ID = "270a24ba-b983-40d6-9a1f-98a1bbc2fb65"

inference_params = {"max_new_tokens": 1024, "use_attachments": True}

model_config = client.create_model_config(
    name="Example Model Config",
    model_id=MODEL_ID,
    inference_params=inference_params,
)

Attaching model config to project

You can attach and remove model configs to your project using project.add_model_config or project.remove_model_config. Both methods take just a model_config ID.

project.add_model_config(model_config.uid)

Delete model config

You can also delete model configs using the client.delete_model_config. You just need to pass in the model_config ID in order to delete your model config. You can obtain this ID from your created model config above or get the model configs directly from your project using project.project_model_configs and then iterating through the list of model configs attached to your project. Uncomment the code below to delete your model configs.

# model_configs = project.project_model_configs()

# for model_config in model_configs:
#     client.delete_model_config(model_config.uid)

📘

Information

To finish setting up your Live multimodal chat evaluation project, you will need to navigate to your project overview inside the Labelbox platform and select Complete setup on the left side panel

Exporting Live multimodal chat evaluation project

Exporting from a Live multimodal chat project works the same as exporting from other projects. In this example, your export will be empty unless you create labels inside the Labelbox platform. Please review our Live multimodal chat evaluation export guide for a sample export.

# The return type of this method is an `ExportTask`, which is a wrapper of a`Task`
# Most of `Task` features are also present in `ExportTask`.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "workflow_status": "InReview",
  "batch_ids": ["batch_id_1", "batch_id_2"],
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:
def json_stream_handler(output: lb.BufferedJsonConverterOutput):
  print(output.json)

if export_task.has_errors():
  export_task.get_buffered_stream(
    stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_buffered_stream(
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True
}

# You can set the range for last_activity_at and label_created_at. You can also set a list of data 
# row ids to export. 
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.

# Note: Combinations of filters apply AND logic.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "workflow_status": "InReview",
  "batch_ids": ["batch_id_1", "batch_id_2"],
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}

export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)