Compare multiple models

As you continue iterating on your model and your data, you will likely end up with many model runs. A model run represents an experiment (i.e., an instance of training a specific model on some specific data).

At this point, machine learning teams typically want to compare the predictions and the performance of their different models. The goal of comparing models is to measure and understand the marginal value of every machine learning iteration (additional labels, reworked labels, modeling improvements, hyperparameter fine-tuning, etc).

The Model product is designed to help you compare model runs: you can visualize predictions and compare metrics between model runs.

Before you start

You will need a model with 2 or more model runs to use this feature. Visit these pages to get started:

Select model runs

Once you select the first model run, use the drop-down along the menu bar labeled Compare against. Then, select the second model run to run the comparison. Labelbox automatically assigns each model run a different color to distinguish them in metrics and visualization.


Compare a model run to any other model run, from inside the model.

Compare two model runs visually

From the gallery, you can click on an individual data row to expand it. Using the sidebar you can toggle which ground truth and model runs to view. To learn more about how to view predictions see Visualize model predictions.

Compare two model runs with metrics

We support the comparison of scalar metrics and confusion metrics between two model runs.

Compare the distribution of annotations and predictions in each Model Run

Compare the distribution of annotations and predictions in each Model Run


Get an overview comparison between 2 model runs


Compare the confusion matrices of 2 model runs


Compare the scalar metrics of 2 model runs

What’s Next