Once you find a pattern of model errors, you can take action to improve your model. Here is an example of a data-centric iteration to improve model performance.
- Select data rows on which your model is struggling. See Find model errors (error analysis) to learn about useful techniques for surfacing and selecting these Data Rows that your model struggles to predict.
- Open the selected data rows in the Catalog
- Click on 11 selected
- Click on View in Catalog
The selected data rows will show up in the Catalog.
- Click on Create function and name your function (named difficult cases in this example).
- The newly created function will help you surface data that is similar to this pattern of model failures - among all of your Labelbox data. If you want to surface this data, filter to keep only unlabeled data, label it in priority, and retrain your machine learning model on the improved dataset.
- Go to the Catalog.
- Select All datasets to explore all of your data.
- Enable the Functions filter to keep only data rows that look similar to the pattern of model failures. Labelbox will automatically sort data rows in decreasing order of the function similarity score.
- Then, filter on Annotations > is none to keep only unlabeled data rows.
These are high-impact, unlabeled data rows. Once labeled, you can re-train your model on this newly labeled data and it will boost model performance.
- To sample the top 100 of these data rows,
- Click on Sample.
- Select Ordered.
- Type 100 data rows.
Then, you can submit the batch to your labeling project.
- Select the destination labeling project.
- Click on Submit batch.
Once this data has been labeled in Labelbox, you can create a new model run, include these newly labeled data rows in your data splits, and retrain your machine learning model to improve its ability to detect basketball courts and ground track fields.
Congratulations, you have been through a data-centric iteration to improve your model!
Updated 15 days ago