Compare model runs

Visualize predictions and compare metrics between model experiments.

As you continue iterating on your model and your data, you will likely end up with many model runs. A model run represents an experiment (i.e., an instance of training a specific model on some specific data).

At this point, machine learning teams typically want to compare the predictions and the performance of their different models. The goal of comparing models is to measure and understand the marginal value of every machine learning iteration, such as creating additional labels, reworking labels, modeling improvements, and hyperparameter fine-tuning.

The Model product is designed to help you investigate model performance by visualizing predictions and comparing metrics between model runs.

Before you start

You will need a model with two or more model runs to use this feature. Visit these pages to get started:

Select model runs

Once you select the first model run, click the drop-down along the menu bar titled Compare against. Then, select a second model run to initiate the comparison. Labelbox automatically assigns each model run a different color so that they can be distinguished in metrics and visualization.


Compare a model run to any other model run from inside the model directory.

After selecting two model runs to compare, you will be able to see the compared results overlayed on the thumbnails of the data rows.

Visualize a comparison of the predictions.

Visualize a comparison of the predictions.

Compare two model runs visually

From the gallery, you can click on an individual data row to expand it. Using the right sidebar, you can toggle which ground truth annotations and predictions to view, along with which model runs to display in general. For even more visualization options, click Display.

To learn more about how to visualize model predictions, visit the section on the Gallery view.

Compare two model runs with metrics

You can compare both scalar metrics and confusion metrics between two model runs.

Compare the distribution of annotations and predictions in each Model Run

Compare the distribution of annotations and predictions in each model run.


Get a metrics overview comparison between two model runs.


Compare the confusion matrices of two model runs.


Compare the scalar metrics of two model runs.

Compare model run configs

When comparing two model runs, you can compare the model run configs, as well as model performances, to understand how the hyperparameters are affecting your model.

Compare model run configs.

Compare model run configs.