Splits

How to filter and visualize the data in a model run by splits.

Labelbox recommends splitting your labeled dataset into three sets: training, validation, and test. Doing so greatly reduces your chances of overfitting your model.

See Curate data splits to learn how to assign data rows to splits inside a model run.

Filter data by split

By default, when opening a model run, you see all the data it contains under All training data.

You can then display only the data corresponding to a specific split by clicking on Splits and then clicking on the split you care about: Train, Validation, or Test.

Data rows that are in the model run but that are not assigned to any split will show up under All training data, but not under any split.

Visualize the distribution of your data by split

You can visualize the distribution of your data in each data splits in the projector view. This helps assess whether data splits share similar distribution.

Selected data rows (here, the training set) show up in orange

Selected data rows (here, the training set) show up in orange

You can also color data rows by class and see how separable the classes are. Click on the projector view icon. Select the data split you want to view. You can pick a class in that data split to color by clicking the color palette icon and selecting the class name.

Configure data splits

Once you have created a model run inside a model, the model run will access all the data rows selected for training from the Create a model step. From here, you can configure the train, validation, and test splits.

  1. The default data split is 80% training, 10% validation, and 10% testing. You can adjust the data splits by using the slider, or typing in the input field. If you have a previous model run within the model directory, you can choose to load from the previous config. You also need to name the model run, such as “model iteration 1”.

Next, click Create model run.

  1. Now you should be able to see the annotations on the train, validation, and test splits (you might need to wait for a few seconds and refresh it for the UI to finish loading all data rows). From here, you can view the annotations from each data split.

If you want to move some data rows from one split to another, you can select those data rows, click N selected, and click Send to to move them to a different split.

1200

You can define a new split distribution during model run creation or re-use the previous model run data split distribution

Modify data splits

3390

You can move data between the train, validate, and test splits

Once you are happy with the data rows and data splits you selected for training a model, you are ready to train a machine learning model. You can choose to train a model in your custom ML environment (see
Export data for model training outside of Labelbox), or to train a model via the one-click model training integration.

Visualize the distribution of your data by split

You can visualize the distribution of your data in each data splits in projector view. You can also visualize whether your data splits share similar distribution. You can also color data rows by class and see how separable the classes are.

Click on the projector view icon. Select the data split you want to view. You can pick a class in that data split to color by clicking the color palette icon and selecting the class name.