Compare model runs
Visualize predictions and compare metrics between model experiments.
As you continue iterating on your model and your data, you will likely end up with many model runs. A model run represents an experiment (i.e., an instance of training a specific model on some specific data).
At this point, machine learning teams typically want to compare the predictions and the performance of their different models. The goal of comparing models is to measure and understand the marginal value of every machine learning iteration, such as creating additional labels, reworking labels, modeling improvements, and hyperparameter fine-tuning.
The Model product is designed to help you investigate model performance by visualizing predictions and comparing metrics between model runs.
Before you start
You will need a model with two or more model runs to use this feature. Visit these pages to get started:
Select model runs
Once you select the first model run, click the drop-down along the menu bar titled Compare against. Then, select a second model run to initiate the comparison. Labelbox automatically assigns each model run a different color so that they can be distinguished in metrics and visualization.
After selecting two model runs to compare, you will be able to see the compared results overlayed on the thumbnails of the data rows.
Compare two model runs visually
From the gallery, you can click on an individual data row to expand it. Using the right sidebar, you can toggle which ground truth annotations and predictions to view, along with which model runs to display in general. For even more visualization options, click Display.
To learn more about how to visualize model predictions, visit the section on the Gallery view.
Compare two model runs with metrics
You can compare both scalar metrics and confusion metrics between two model runs.
Compare model run configs
When comparing two model runs, you can compare the model run configs, as well as model performances, to understand how the hyperparameters are affecting your model.
Updated 4 months ago