This article details how the labeling interface processes and loads assets, including best practices for setting up your data pipeline to optimize Editor speed and performance.
At its core, the labeling interface is designed to optimize labeling speed and data integrity when displaying your data and labeling tools. To achieve these goals, there are a few strategies we implement in the background:
Cache assets: When a labeler opens the Editor to start labeling, they reserve a set number of assets in their labeling queue (more on this here). By default, the Editor processes and caches the assets reserved in their queue so that as soon as they complete a labeling task on one asset, the next asset loads instantly.
Disable during loading: Still, assets need to load in the Editor, so to ensure data integrity, labeling tools and Editor controls remain inactive until an asset is fully loaded. This prevents labelers from unintentionally getting into a state with an asset that may compromise the label's accuracy.
Strip EXIF data: Because web browsers have frequently changed how they handle EXIF data, we currently strip any EXIF data that exists on your assets to guarantee your labels are always drawn and stored at the same orientation, ensuring data integrity.
To minimize disruptions and sunk costs in your labeling jobs, the following are recommended best practices for setting your projects and labelers up for success.
Optimize asset size. The most effective way to optimize Editor performance is to limit asset size to the maximum resolution needed for accurate labeling and effective model training. Some training tasks may require more resolution than others, but generally, we recommend images no larger than 4000x4000px and videos with frame rates no greater than 30fps.
Delegate access to your data storage. For optimal security and performance, use IAM delegated access to connect your data to Labelbox. This will allow you to keep your assets in cloud storage and simply delegate Labelbox the limited access it needs to display your assets in the Editor efficiently.
Set up a CDN. If your labelers are accessing your assets from many different locations, or are located at a different timezone from where your data is stored, setting up a CDN can speed up the delivery of assets to your labelers.
Conduct a trial run. Prior to kicking off a labeling project, set up a test project with your assets to be labeled, and simulate the environment you expect your labelers to be in when labeling. Namely, test during the peak hours you expect labelers to be working on your job, from the area they will be located, and on the network and machines they will be using to access Labelbox. This will give you an opportunity to identify and reduce/resolve any latencies or issues upfront, rather than during an ongoing labeling job.
The performance of the Editor is dependent on a number of factors, including asset size, location of the data and labeler, time of day, machine specs, and network speed.
Below are the suggested thresholds for optimal performance. Note that getting within all of these thresholds is not required for using Labelbox effectively. These are just meant to serve as a benchmark you can reference if you are interested in improving Editor performance for your labeling job.
256MB or smaller
Location of data and labeler
within the same time zone
Time of day
16GB or greater
200mbps or faster
Updated 15 days ago