Migration: Logistics and FAQ

Additional information on the migration to workflows, data rows tab, and batch-based queuing.

Over the past few months, we’ve released workflows on a rolling basis.

With this feature, you can create a highly customizable, step-by-step review pipeline to further drive efficiency and automation in your review process. With the arrival of workflows, all new projects are now automatically configured with batch-based queueing, the Data Rows tab, and workflows.

Given this new paradigm, Labelbox will be sunsetting dataset-based queueing, the Labels tab, and the Review step in favor of this new way to queue and review your data rows.


Migration details

When will the migration of my old/existing projects take place?

The migration will take place on a rolling basis starting March 12th.

You will receive an email specifying your migration date. We will be sending out expected migration dates on Monday before the weekend your migration takes place – please check your email to see when your migration will take place.

What will happen to my old or existing projects?

Old or existing projects that were created before the launch of workflows still contain dataset-based queueing, the Labels tab, and the Review step.

Labelbox will automatically be migrating your old/existing projects to the new paradigm (batch-based queuing, Data Rows tab, and workflows). This will happen on Labelbox’s backend. No action is required on your end.

We will preserve all data rows, along with the associated labels and reviews (thumbs up/down) in the migrated projects. You will be able to query the review data in these projects and take actions required, if any, in the Workflow paradigm at your discretion, including moving data rows to the appropriate workflow task. Keep reading below for further details.

Which projects will be migrated?

All projects that have a ‘legacy’ tag will automatically be migrated to the new paradigm. Old/existing projects on your projects page (those that still use dataset-based queueing, the Labels tab, and the Review step) can be distinguished by a Legacy tag on the projects page.

You can also use the Tags filter at the top of the Annotate page to see only projects that have the legacy tag on them.

Projects without a Legacy tag are already using the new paradigm and do not need to be migrated.

What will happen during the migration?

To minimize disruptions in your workflow, migrations are scheduled on the weekend. The migration will be automatic, meaning there is no action required on your end.

The migration will take approximately a few hours. During the migration, you will be temporarily unable to use the Labelbox app and SDK. If you attempt to log in during the migration window, you’ll be met with the following message.

After the migration is complete, you will receive a follow-up email notifying you that your legacy projects have been successfully migrated and you can log in as normal.

Will all of my projects be migrated?

No, only the projects that were created in the legacy paradigm of dataset-based queueing and the Labels tab for project management will be migrated over to the new paradigm that uses batch-based queueing for queueing data, the Data Rows tab for project management, and workflows for data quality and review.

Will I be able to use Labelbox while the migration is occurring?

No. At the time of migration, all users will be automatically logged out of the platform. To minimize any disruption, we are notifying you earlier and only doing the migration on weekends. However, please reach out to our support team if you have any questions/concerns about the dates proposed to your organization.

What if I have questions about the migration?

Please reach out to our customer support team and keep the subject-specific to migration to help us triage the issues as soon as possible.

How will I know when the migration is complete?

The Labelbox team will notify you once the migration has been completed. In the UI, you will also notice that all legacy projects that have now been migrated will carry a migrated tag instead of the legacy tag on the Annotate page.


Post-migration details

What can I expect after my legacy projects have been migrated?

  • All projects that have been migrated will carry a tag stating that as follows:

  • All projects will now use batch-based queuing, the Data Rows tab, and workflows
  • A dataset with less than 100k data rows will appear as a batch. Datasets with more than 100k data rows will be split into multiple batches.
  • Each dataset attached previously will now be added a batch up to a size of 100,000 rows. If a previously attached dataset exceeds that size,
    • The dataset will now be added as multiple batches, each up to a size of 100,000 data rows.
    • The batches will have indices in their name as <dataset_name>_1, <dataset_name>_1 and so on.
  • If there are detached datasets in the project, all data rows with at least 1 label from these datasets will be preserved in the project.
    • All data rows from the detached datasets will be combined into a single batch named __detached_dataset_but_labeled_rows, up to a size of 100,000 data rows. Beyond that size, these batches will have indices in the batch names similar to what is described above.
    • Note that these data rows (with at least one label) will be routed to either the first review task or the Done step in the workflow
  • Data row labeling priority configured via Labeling Parameter Overrides will be preserved.
  • Legacy Reviews done via the Review step and thumbs up/down at a label level will continue to be accessible.

How do I access my legacy review data?

  • To view the legacy reviews on the Data Rows tab, within the filter widgets, select the Label Actions > Has Review (legacy).
  • This will filter the view to all data rows that have at least one legacy review on any of the labels.
  • Using the filter will also display the thumbs up/down at a label level for the data row in the same view.
    • In order to view the labels and legacy reviews in the data row table, select the Show labels checkbox.

  • Once you click into any of the data rows with reviews highlighted, these reviews will also show up on the Data Row Browser at the label level as follows.

  • Note that all data rows will end up in one of the following workflow tasks:
    • Initial Review Task, if the data row has all the required labels
    • Initial Labeling task, if the data row has less than the required labels (particularly applicable to consensus data rows)
  • You can use any combination of filters available on the Data Rows tab, including the ones related to new and legacy reviews, to move them to a different step in the workflow using the Move to step action.
# In the UI, create a project with datarows and task queues

project_id = 'clan68i5l01rj07246zra5cg4' # Replace with project id
project = client.get_project(project_id)
task_queues = project.task_queues()

# Obtain the review queue (Replace string with MANUAL_REWORK_QUEUE to obtain the rework queue)
review_queue = next(tq for tq in task_queues if tq.queue_type == "MANUAL_REVIEW_QUEUE")

datarow_id = 'cl7uo67jg4ndx0785dc7wbo5y' # Replace with data row id to move

# Move data rows to the specified queue
task = project.move_data_rows_to_task_queue([datarow_id], review_queue.uid)

# Verify that data rows have been moved in the UI
# Move data rows to Done
task = project.move_data_rows_to_task_queue([datarow_id], None)

# Verify that data rows have been moved in the UI

If you have any questions surrounding the migration of legacy projects, please reach out to our support team or your dedicated customer support manager.


Labeling queue

Can I use the rework task instead of delete-and-requeue?

Delete-and-requeue will still be supported for a little while longer. Eventually, rework will replace delete-and-reqeueue when the asynchronous functionality in the rework task is at parity with delete-and-requeue.

Will I still be able to “save label as template”?

Yes. This same functionality will be provided when you send a data row to the rework task. Labelbox will preserve the existing annotations on the data row so the next person who opens the data row as a rework task will be able to use the existing annotations as a starting point.

Will I have access to a master record of all actions on a labeled data row, not just createdAt?

We will be expanding on the audit log functionality in early 2023 which should provide a master record of all actions. However, this is not exposed in today’s version. In the future, we may expand the workflow webhooks to include a more comprehensive master record.


Importing data

To add a batch to a project, am I still required to upload a dataset to Labelbox?

Yes. The concept of uploading a dataset to Labelbox is not going away. After you upload a dataset, the dataset will live in Catalog. From Catalog, you can add data rows to a project via batches. Labelbox supports batch sizes of up to 100,000 data rows.

I don’t have a need for batches or workflows. What do I do?

You can still add your entire dataset to a batch and send the batch to a project for labeling. Here is the Python SDK reference to do this.

Exporting data

Will the dataset still be attached in the export? What will change in the export?

The information regarding the attached dataset will be replaced by batch information. Additionally, the DataRow Workflow Info section will contain information on how the data row pertaining to the label being considered has traversed the workflow between the tasks and the actions taken by users to affect these changes.

Are the data row statuses visible in the export?

Data row workflow history is shown in the export (the tasks the data row has traversed through). This is shown in reverse chronological order.


SDK

What is the migration plan if I have a programmatic integration with datasets?

The end state for every project is the batches + data row tab + workflow paradigm. Whether you set up a project via the SDK or UI, every project will be treated the same.

You can still create a project, upload a dataset, and send a batch for labeling via the SDK.


Quality control

What replaces thumbs up / thumbs down voting?

For all new projects, thumbs up / thumbs down voting will be replaced by the approve/reject workflow. In review tasks, the following actions can be taken:

Benchmark

  • Approve: Reviewer will approve a data row
    • This will move the data row to qualify for the next task in the Workflow.
    • In case there is only a single review task (such as the "Initial review task" that Labelbox automatically creates), the data row will end up in "Done".
  • Reject: Reviewer will reject a data row
    • This will move the data row to the Rework task

Consensus

  • Approve: Reviewer will choose a winner label to approve the data row.
    • This will move the data row to qualify for the next task in the Workflow.
    • In case there is only a single review task (such as the "Initial review task" that Labelbox automatically creates), the data row will end up in "Done".
  • Reject: Reviewer will reject a data row, essentially rejecting all labels on that data row
    • This will move the data row to the "Rework" task.
    • In "Rework", any labeler or reviewer can modify all labels associated with the data row.

You can also use the 'Move to step' functionality for ad-hoc review.

If I am not planning on using quality settings, which configuration is recommended: benchmarks or consensus?

Labelbox recommends selecting benchmarks since it is a good practice to have your annotators measured against ground truth data on a continuous basis.

How do I set consensus at the batch level?

When you add a batch to a project (after you already selected consensus for your project), you will be prompted to configure consensus for that batch.

If you are in a consensus project, you will see 3 configuration settings:

  • A toggle to enable/disable consensus for that batch
  • A slider to set the coverage percentage and
  • A place to enter the number of labels.

Items #2 and #3 in the list above replace the old labeling parameter overrides (LPO) feature as it enables you to customize the assets in the queue at the batch level. In the data row tab, you’ll be able to see the number of existing labels and the number of expected labels for each consensus data row.

🚧

Note

Note: Once you add a batch to a project, you cannot change the number of labels setting. In the future, we may support dynamic consensus settings.

Can I enable benchmarks AND consensus on a project?

It is not supported yet, but we will be releasing that functionality soon.