Overview
Model training, testing, and evaluation.
Models is a product designed to easily train, diagnose, and rapidly improve the performance of your machine learning models when used in conjunction with Catalog and Annotate.


Benefits
With Models, you can:
- Curate data splits (train, validation, test) from your labeled data in Labelbox
- Track data versions and model configuration to make experiments reproducible
- Train a model with one click in your desired computing environment with model training service integration (i.e., cloud or customer-managed infra)
- Evaluate & test your machine learning models
- Compare models coming from multiple experiments to identify the best performing one
- Take action to improve your model: identify and prioritize the right data to label, find and fix labeling mistakes, and pre-label your data


Key definitions for Models
Term | Definition |
---|---|
Model | A Model is a directory where you can create, manage, and compare a set of Model Runs related to a same machine learning task. Each Model is specified by an ontology of data: it defines the machine learning task of the Model Runs inside the directory. |
Model Run | A Model Run is a model training experiment within a Model directory. Each Model Run has its data snapshot (Data Rows, annotations, and data splits) versioned. You can upload predictions to a Model Run, and compare its performance against other Model Runs in the Model directory. |
Data split | You can split the selected Data Rows into train, validation, and test splits to prepare for model training and evaluation. |
Data versioning | Each Model Run keeps its own versioned data snapshot. The snapshot contains the Data Rows, annotations, and data splits. It is immutable, meaning it remains the same even if new annotations are added or existing annotations are updated. You can export it from the Model Run to train or use it to reproduce a Model. |
Model config (coming soon) | Each Model Run will keep a version of its model configurations (such as hyperparameters), and model type. |
Model training | There are two ways to integrate your labeled data seamlessly with your Model training workflow.
|
Error analysis | Error analysis is the process through which ML teams analyze where model predictions disagree with ground truth labels. A disagreement can be a model error (poor model prediction) or a labeling mistake (ground truth is wrong). |
Slice | A Slice represents a subset of your training data bound by a common characteristic. From Models, you can create slices to visually inspect your training data and view model metrics reported on each slice. |
Active learning | Active learning is the process through which ML teams identify, among all their unlabeled data, which high-value Data Rows they will label in priority. Labeling these Data Rows will optimally improve model performance. |
Updated 3 days ago