Upload model metrics [ OLD ]

In addition to uploading model predictions, you can upload model metrics to a model run.

Metrics are intended to help you quantitatively measure the quality of your model, which is critical for building models efficiently. You have two options for model metrics:

  1. Compute and upload a set of default, built-in metrics.

  2. Upload custom metrics.

Built-in metrics

The SDK provides a set of default metrics that make metrics easy to use. These default metrics are a good start for building an initial understanding of your model performance.

Supported built-in metrics

  1. confusion_matrix_metric()
    • Computes a single confusion matrix metric for all the predictions and labels provided.
  2. miou_metric()
    • Computes a single IOU score for all predictions and labels provided
  3. feature_confusion_matrix_metric()
    • Computes the IOU score for each of the classes found in the predictions and labels
  4. feature_miou_metric()
    • Computes a confusion matrix metric for each of the classes found in the predictions and labels



All of these functions expect the predictions and ground truth annotations to correspond to the same data row. These functions should be called for each data row that you need metrics for.

Custom metrics

We recommend that the metrics you use to measure your model quality align with the business objectives for the model. Otherwise, slight changes in model quality, as they related to these core objectives, are lost to noise.

Custom metrics enable you to measure model quality in terms of your exact business goals.

Supported custom metrics

Users can provide metrics at the following levels of granularity:
1. Data rows
2. Features
3. Subclasses

Additionally, you can give metrics custom names to best describe what they are measuring.

Currently, ScalarMetrics and ConfusionMatrixMetrics are supported.


A ScalarMetric is a metric with just a single scalar value.

from labelbox.data.annotation_types import (ScalarMetric,
data_row_metric = ScalarMetric(metric_name="iou", value=0.5)

feature_metric = ScalarMetric(metric_name="iou", feature_name="cat", value=0.5)

subclass_metric = ScalarMetric(metric_name="iou",


A ConfusionMatrixMetric contains 4 numbers [true positive, false positive, true negative, false negative]. Confidence is also supported as key-value pairs, where the score is the key and the value is the metric value. In the user interface, these metrics are used to derive precision, recall, and f1 scores. These are not directly uploaded because the raw data allows for processing on the front end.

data_row_metric = ConfusionMatrixMetric(metric_name="50pct_iou",
                                        value=[1, 0, 1, 0])

feature_metric = ConfusionMatrixMetric(metric_name="50pct_iou",
                                       value=[1, 0, 1, 0])

subclass_metric = ConfusionMatrixMetric(metric_name="50pct_iou",
                                        value=[1, 0, 1, 0])


You can provide confidence scores along with metrics. This enables you to explore your model performance without necessarily knowing the optimal thresholds for each class. You can also can filter on confidence and value in the UI to perform powerful queries. The keys represent a confidence score (must be between 0 and 1) and the values represent either a scalar metric or for confusion matrix metrics [TP,FP,TN,FN].

confusion_matrix_metric_with_confidence = ConfusionMatrixMetric(
        0.1: [1, 0, 1, 0],
        0.3: [1, 0, 1, 0],
        0.5: [1, 0, 1, 0],
        0.7: [1, 0, 1, 0],
        0.9: [1, 0, 1, 0]

scalar_metric_with_confidence = ScalarMetric(metric_name="iou",
                                                 0.1: 0.2,
                                                 0.3: 0.25,
                                                 0.5: 0.3,
                                                 0.7: 0.4,
                                                 0.9: 0.3


This is an optional field on the ScalarMetric object (by default it uses Arithmetic Mean).

Aggregations occur in two cases:
1. When you provide a feature or subclass-level metric, Labelbox automatically aggregates all metrics with the same parent to create a value for that parent.
E.g. You provide cat and dog Intersection-Over-Union (IOU). The data row-level metric for IOU is the average of both of those.
The exception to this is when the data row-level IOU is explicitly set, then the aggregation will not take effect (on a per data row basis).
2. When you create slices or want aggregate statistics on your models, the selected aggregation is applied.

If the following metrics are uploaded then
in the Labelbox App, users will see:
true positives dog = 4
true positives cat = 3
true positives = 7

feature_metric = ScalarMetric(metric_name="true_positives",

feature_metric = ScalarMetric(metric_name="true_positives",

Limits and behavior

  • A data row cannot have more than 20 metrics
  • Metrics are upserted, so if a metric already exists, its value will be replaced
  • Metrics can have values in the range [0,100000]

How to upload (built-in or custom) metrics to a model run

Option 1: Labelbox one-click model training

If you have trained your model using Labelbox's one-click model training integration, the model metrics will automatically appear in Labelbox once the model has finished training.

Option 2: Custom model training

If you have trained your model outside of Labelbox, built-in metrics and custom metrics should be uploaded in the same way to a model run. They are uploaded as metric annotations.

When uploading to a model run, you can upload metrics annotations and/or prediction annotations. Hence, you can upload model metrics to a model run, without uploading predictions. Similarly, you can upload model predictions to a model run, without uploading model metrics.

  1. First, construct a metric annotation in one of two ways:
  • Manually (custom metrics)
  • Using one of the built-in functions feature_miou_metric, miou_metric, confusion_matrix_metric, feature_confusion_matrix_metric.
  1. Then, associate the metric annotations with a data row (just like you would associate prediction annotations with a data row, if you wanted to upload model predictions).

  2. Convert to NDJSON and upload

  • NDJSON must be created. Fortunately, this is made easy with converter functions NDJsonConverter.serialize.
# First, construct a metric annotation in one of two ways
# For instance, you might compute metrics using built-in functions
metrics_annotations = []
metrics_annotations.extend(feature_miou_metric(ground_truth.annotations, prediction.annotations))
# Or, you might compute metrics manually
metrics_annotations = [*conf_matrix_metrics, *iou_metrics]

# Associate the metric annotations with a data row
upload = [
      data=ImageData(uid="cktiom8osh4210ytmevuk7lfh"), # the data row
      annotations=metrics_annotations # the metrics annotations and/or the prediction annotations

# Convert to NDJson
ndjson_upload = list(NDJsonConverter.serialize(upload))
# Sanity check the conversion
print(json.dumps(ndjson_upload, indent=2, sort_keys=True))

# Upload metrics annotations, possible alongside prediction annotations
model_run.add_predictions(f'diagnostics-import-{uuid.uuid4()}', ndjson_upload)
# Wait for upload to be done and sanity check the upload

This is an example of a valid NDJSON upload file, containing metrics annotations.

    "aggregation": "CONFUSION_MATRIX",
    "dataRow": {
      "id": "cktiom8osh4210ytmevuk7lfh"
    "featureName": "cat",
    "metricName": "50pct_iou",
    "metricValue": [
    "uuid": "f36e393d-e98a-498c-977a-181cde417921"
    "aggregation": "CONFUSION_MATRIX",
    "dataRow": {
      "id": "cktiom8osh4210ytmevuk7lfh"
    "featureName": "cat",
    "metricName": "50pct_iou",
    "metricValue": [
    "uuid": "c6f32f7b-9391-4ebe-8bf4-d3459ca1e65e"

Complete Python SDK tutorial

Custom Metrics - basicsOpen in GithubOpen In Colab
Custom Metrics - demoOpen in GithubOpen In Colab

What’s Next