Model metrics

Upload metrics to diagnosis errors with model performance. Metrics can be used in combination with other filters to explore model performance.

Hierarchy

Users can provide metrics at the following levels of granularity:

  • Data Rows
  • Features
  • Subclasses

Additionally, metrics can be given custom names to best describe what they are measuring.

Limits and Behavior:

  • A data row can have no more than 20 metrics
  • Metric names are unique
  • Metrics are upserted, so if a metric already exists, its value will be replaced
  • Metrics can have values in the range [0,100000]
  • Two types of metrics are supported ScalarMetrics and ConfusionMatrixMetrics

Name

Github

Google Colab

Basic

Open in Github

Open In ColabOpen In Colab

Image Example

Open in Github

Open In ColabOpen In Colab

Scalar Metrics

data_row_metric = ScalarMetric(
    metric_name = "iou",
    value = 0.5
)

feature_metric = ScalarMetric(
    metric_name = "iou",
    feature_name = "cat",
    value = 0.5
)

subclass_metric = ScalarMetric(
    metric_name = "iou",
    feature_name = "cat",
    subclass_name = "orange",
    value = 0.5
)

Confusion Matrix Metrics

  • A ConfusionMatrixMetric contains 4 numbers [true positive, false positive, true negative, false negative]
  • Confidence is also supported a key value pairs, where the score is the key and the value is the metric value.
  • In the user interface, these metrics are used to derive precision, recall, and f1 scores
data_row_metric = ConfusionMatrixMetric(
    metric_name = "50pct_iou",
    feature_name = "cat",  
    subclass_name = "orange",    
    value = [1,0,1,0]
)


feature_metric = ConfusionMatrixMetric(
    metric_name = "50pct_iou",
    feature_name = "cat",  
    subclass_name = "orange",    
    value = [1,0,1,0]
)

subclass_metric = ConfusionMatrixMetric(
    metric_name = "50pct_iou",
    feature_name = "cat",  
    subclass_name = "orange",    
    value = [1,0,1,0]
)

Confidence

  • Users can provide confidence scores along with metrics
  • This enables them to explore their model performance without necessarily knowing the optimal thresholds for each class.
  • Users can filter on confidence and value in the UI to perform powerful queries.
  • The keys represent a confidence score (must be between 0 and 1) and the values represent either a scalar metric or for confusion matrix metrics [TP,FP,TN,FN]
confusion_matrix_metric_with_confidence = ConfusionMatrixMetric(
    metric_name = "confusion_matrix_50pct_iou",
    feature_name = "cat",  
    subclass_name = "orange",    
    value = {0.1 : [1,0,1,0], 0.3 : [1,0,1,0], 0.5 : [1,0,1,0], 0.7 : [1,0,1,0], 0.9 : [1,0,1,0]}
)

scalar_metric_with_confidence = ScalarMetric(
    metric_name = "iou",
    value = {0.1 : 0.2, 0.3 : 0.25, 0.5 : 0.3, 0.7 : 0.4, 0.9: 0.3}
)

Aggregations

An optional field on the ScalarMetric object (by default it uses Arithmetic Mean).

Aggregations occur in two cases:

  1. When a user provides a feature or subclass level metric, Labelbox automatically aggregates all metrics with the same parent to create a value for that parent.

    • E.g. A user provides cat and dog IOU. The data row level metric for IOU is the average of both of those.
    • An exception to this is when the data row level IOU is explicitly set, then the aggregation will not take effect (on a per data row basis).
  2. When users create slices or want aggregate statistics on their models, the selected aggregation is applied.

"""
If the following metrics are uploaded then
in the web app, users will see:
true positives dog = 4
true positives cat = 3
true positives = 7
"""

feature_metric = ScalarMetric(
    metric_name = "true_positives",
    feature_name = "cat",
    value = 3,
    aggregation = ScalarMetricAggregation.SUM
)

feature_metric = ScalarMetric(
    metric_name = "true_positives",
    feature_name = "dog",
    value = 4,
    aggregation = ScalarMetricAggregation.SUM
)

Did this page help you?