Model training, testing, and evaluation.

The Model product is designed to help you achieve any of the following: easily prepare and version training data, launch model training experiments, diagnose the performance of your machine learning models, and rapidly improve their performance. Model can be used in conjunction with our Catalog and Annotate products.

Version & compare ML experiments

Labelbox recognizes that ML teams need a model experiment management system in order to efficiently collaborate with their team members and make their experiments reproducible. You can think of the Model product as your command center for tracking all datasets and configurations at each data-centric iteration. The model run config feature automatically versions and tracks your datasets and model configurations with each Model Run.

The Model product also provides a framework for comparing your models across multiple experiments. A model run is a representation of a model training experiment that contains a data snapshot (data rows, annotations, and data splits) at each iteration. By comparing model runs, you can visualize which configurations work best for your model and identify the model with the best performance. You can also use this framework to make sure you are making progress in your experiments.

Labelbox designed the Model product to be highly visual. You can use the gallery view to visually inspect and compare annotations and model predictions.


Compare model predictions and ground truths, on training data and on inference data

Diagnose and improve model performance

The Model product gives you several ways to diagnose and improve the performance of your models. Using the embeddings projector, you can visually inspect the data points to understand the distribution of your training/inference data, splits, annotations, and predictions.

With each data-centric iteration, you can improve your model performance by identifying model challenge cases, prioritizing the right labels to fix them, finding & fixing labeling mistakes, and pre-labeling your data.


Understand the distribution of your training and inference data, splits, annotations, and predictions

Integrate with model training service

Although Labelbox does not provide a model training service on our backend, Labelbox does make it easy to integrate your computing environment with Labelbox (i.e., cloud or customer-managed infra), launch model training with one click, and track all of the progress within the Labelbox UI.

To learn more, visit our docs on Model training service integration.

Evaluate model performance

The Model product also provides a natively supported metrics view that enables you to evaluate the performance of your machine learning models. Model error analysis is the process by which you analyze where your model predictions disagree with ground truth labels. These quantitative metrics help you find low-performing slices of data so you can best understand how to tackle the root cause of the issue.


Analyze model metrics and zoom into low performing slices of data

Model end-to-end experience


Explore models

When you navigate to the Model tab, you will be able to explore some open-source models in our public demo workspace or you can explore your own models. When you select a model, you will have three views to choose from, Gallery view, Metrics view, and Projector view.

Overview of Model capabilities

This video shows how AI teams can quickly and easily manage their training data and model training processes within Labelbox. Watch to learn how you can find labeling errors, find and fix model errors, curate high-value data for labeling, evaluate and compare model runs, and create and version data splits and hyperparameters.

Key definitions for Model

ModelA model is a directory where you can create, manage, and compare a set of model runs related to the same machine learning task. Each model is specified by an ontology of data: it defines the machine-learning task of the model runs inside the directory.
Model runA model run is a model training experiment within a model directory. Each model run has its data snapshot (data rows, annotations, and data splits) versioned. You can upload predictions to a model run, and compare its performance against other model runs in the model directory.
Data splitYou can split the selected data rows into train, validation, and test splits to prepare for model training and evaluation.
Data versioningEach model run keeps its own versioned data snapshot. The snapshot contains the data rows, annotations, and data splits. It is immutable, meaning it remains the same even if new annotations are added or existing annotations are updated. You can export it from the model run to train or use it to reproduce a model.
Model config (hyperparameters)Each model run will keep a version of its model configurations (such as hyperparameters), and model type.
Model trainingThere are two ways to integrate your labeled data seamlessly with your model training workflow.
1. Export the model run snapshot from Labelbox and train a model in your custom ML environment.
2. Use our new model training integration to enable one-click training from Labelbox UI.
Error analysisError analysis is the process through which ML teams analyze where model predictions disagree with ground truth labels. A disagreement can be a model error (poor model prediction) or a labeling mistake (ground truth is wrong).
SliceA slice represents a subset of your training data bound by a common characteristic. From the Model tab, you can create slices to visually inspect your training data and view model metrics reported on each slice.
Active learningActive learning is the process through which ML teams identify, among all their unlabeled data, which high-value data rows they will label in priority. Labeling these data rows will optimally improve model performance.

Visualize annotations and predictions

To diagnose models and rapidly improve model performance, the Model product enables you to visualize annotations and model predictions. Below are supported and planned annotation and prediction visualizations of data types.

Data TypesAnnotation Types
ImageClassification, bounding box, segmentation, polygon, polyline, point
TiledClassification, bounding box, polygon, polyline
TextClassification, named entity (NER)
Video, Geospatial tiled imagery, Documents, DICOMComing soon

Performance and limitations

A model run can contain up to 1 million data rows.

What’s Next