Batch-based queueing is a method of submitting a subset of your dataset to a project's labeling queue, rather than sending an entire dataset to be labeled.
Batch-based queueing enables you to do the following:
- Use filters to find the most relevant data rows to label next.
- Easily and quickly queue higher priority data for labeling and move it to the front of the labeling queue.
- Randomly sample data from a large dataset into a project for labeling.
For example, say you import several large datasets to Labelbox. However, you only need to label about 150 images containing rare edge cases. Within Labelbox, you can filter out a subset of 150 data rows from your datasets based on a set of characteristics and save that filtered subset as a batch. Then, you can send that batch of 150 images to a project for your team to label.
Every batch can have a maximum of 10,000 data rows and each data row can only be submitted in one batch per project. For more details on how to create batches, please refer to the article batches.
Before you can create a batch, there are a few steps you must complete to add your dataset to Labelbox.
Step 1. Connect Labelbox to your cloud storage provider via IAM delegated access. See Integrations to learn how to set this up.
Step 2. Add your dataset to Labelbox. See Create a dataset for instructions.
Step 3. Find your dataset in Catalog, then use the filters to select the subset of data rows you want to label. Then, create a batch and send it to a project's labeling queue. For instructions, see Batches.
A batch can be created from two places within the Labelbox platform:
Follow the steps in Create a batch to create a batch by using filters and sampling to curate the most important data to be labeled.
Go to the Data Rows tab and select New Batch. That will take you to the Catalog view where you can select Data Rows and add them to a project. Follow the steps in Create a batch to create a batch by using filters and sampling to curate the most important data to be labeled.
If your project uses consensus as the quality setting, you can configure the consensus labeling parameters at the batch level:
- Data row priority:: This value indicates where this batch of data rows will be placed in the labeling queue according to priority.
- % Coverage: This value indicates the percentage of data rows in the batch that would be queued for labeling by multiple labelers, and hence would have multiple labels.
- #Labels: Of the data rows to be labeled, this value indicates the number of times a data row should be labeled.
Consensus settings configured at the batch level are:
- Deterministic: A data row is selected for consensus labeling (labeled by multiple labelers) at the time of addition to the project as part of a batch. This is not subject to change.
- Transparent: This is made visible via the Consensus column in data rows tab .
However, you have the flexibility to modify these settings for batches as they are added to the project. Hence, if during the course of a project, you realize that the data rows selected for multiple labels require more or fewer labels than in the previous batches, you can modify this when adding the next batch of data.
Click on Batches to view all of the existing batches that have been added to your project. When you click the menu option on each batch, you'll see an option to rename or archive the batch. You'll also be able to remove the remaining unlabeled data rows from the labeling queue.
Click Batch history to view a changelog of added and removed data rows within this project. You'll also be able to see which batch the data rows belong to.
Updated 16 days ago