Developer guide: Batches
The best way to send data rows from Catalog to Annotate is to create a batch of data rows and send that batch to a labeling project. When you send a batch of data rows to a project's labeling queue, you get a lot of flexibility and control over how those data rows get sent to the labeling project.
See this page to learn the limits for sending batches to a project.
To send a batch of data to a project for labeling, use the following steps:
Go to the Catalog and use the filters to surface a subset of data rows.
Choose Select all or manually select data rows to include in your batch. Once you have the data rows selected, hit the blue button at the top of the screen and select Add batch to project
- From the batch creation modal, choose a project, give the batch a name, and select a priority for the data rows (1 is the highest and 5 is the lowest).
Media type compatibility
A data row that has a different media type than the destination project will not be submitted. For example, if you create a batch of text assets, you will not be able to send that batch to a project for labeling images.
- If you want to speed up labeling, you can send predictions that are stored in a model run as pre-labels in the labeling project. Learn more about How to include predictions as pre-labels in a batch
- Click Submit batch. Then, navigate to the project to begin labeling. You may see a warning that some of the data rows have already been submitted. Data rows that have already been submitted will be excluded when you click submit. Newly queued data rows will be distributed after any data rows in the label queue have already been reserved.
Appending to batches not supported
Once a batch has been submitted you cannot add more data rows.
If your project uses consensus as the quality setting, you can configure the consensus labeling parameters at the batch level:
- Data row priority:: This value indicates where this batch of data rows will be placed in the labeling queue according to priority.
- % Coverage: This value indicates the percentage of data rows in the batch that would be queued for labeling by multiple labelers, and hence would have multiple labels. This field appears on Consensus projects only.
- # Labels: Of the data rows to be labeled, this value indicates the number of times a data row should be labeled. This field appears on Consensus projects only.
- Are deterministic, meaning a data row is selected for consensus labeling (labeled by multiple labelers) at the time of addition to the project as part of a batch. This is not subject to change.
- Transparent, meaning these configurations are made visible via the Consensus column in the Data Rows tab.
However, you have the flexibility to modify these settings for batches as they are added to the project. Hence, if during the course of a project, you realize that the data rows selected for multiple labels require more or fewer labels than in the previous batches, you can modify this when adding the next batch of data.
Manually selecting data for labeling is a time-consuming process. You can use sampling to make the data selection process faster and easier.
Random sampling is a very useful selection technique when you are working with large amounts of data. You can randomly select data rows from your Catalog query to create a batch for labeling or re-work.
Here is an example to help you understand random sampling.
A user constructs a query to find plums in Catalog and clicks Sample. Then the user adjusts the amount of desired data rows in the batch to 100 and clicks Resample. The random selection is always executed on the results of the query.
The ordered sampling selection technique will respect the sorted order of data rows that you see in the results. You can order the results by changing the Created At timestamp to ascending or descending order. When you send a batch to a project, the sequence generated by ordered sampling does not influence the sequence of the data rows in the labeling queue.
Here is an example to help you understand ordered sampling.
In the example below, a user constructs a query in Catalog to find plums and clicks Sample. The user then selects the Ordered parameter, adjusts the amount of desired data rows in the batch to 100, and clicks Resample.
Within Catalog, you can sample from the results of a filter. Sampling can be performed in a random or non-random order. To create a batch with the sampling technique, follow these steps:
- Go to the Catalog tab and filter for the relevant data rows you want to label. Once you have the relevant data rows, click the Sample button on the top right.
- From the batch creation modal, in addition to filling out the project, batch name, and priority details, you can choose how many data rows to sample and the sampling method. Currently, Labelbox supports two sampling methods: random and ordered.
Click Submit batch.
Optionally, include predictions as pre-labels in the batch.
If you want to speed up labeling, you can send predictions that are stored in a Model Run, as pre-labels in the labeling project. Learn more about How to include predictions as pre-labels in a batch
Navigate to the project to begin labeling. Remember to make sure the project is in batch mode to access these new data rows for labeling.
It is not always efficient to label data from scratch. Indeed, it is sometimes recommended to start labeling based on your model predictions. Labelbox allows users to send predictions that are stored in a Model Run, as pre-labels in the labeling project.
A toggle allows you to include or exclude predictions from the batch. Labelbox shows you how many predictions could be sent as pre-labels.
From the dropdown, select the model run and the predictions to include in the batch. You can use the checkboxes to specify which predictions to send to the project as pre-labels.
For example, you may want to use predictions from specific features on which your model is doing well, but exclude predictions from other features on which the model is struggling.
In order to send predictions from a model run as pre-labels to a labeling project, the features must be present in both the model run ontology and the labeling project ontology.
The model run and the labeling project do not need to share the exact same ontology. However, they must share at least some features in common. You can only send predictions as pre-labels, for these features that are shared between the model run and the labeling project.
If no predictions from the model run are compatible with the labeling project, you will see a warning message.
A data row cannot be part of more than one batch in a project at a time.
A batch cannot be shared between projects. However, you can create a new batch using the same data rows.
Updated 8 months ago