Skip to main content
The performance dashboard helps manage labeling operations in Labelbox projects. It reports the throughput, efficiency, and quality of the labeling process. These analytics are reported at the overall project level and at an individual level. You can use filters to focus on specific information and dimensions. The performance of your data labeling operation is broken down into four components:
  • Individual
  • TEQ (Throughput, Efficiency, and Quality)
  • Participation
  • Instructions quiz
Each component has unique views to help you understand the overall performance of your labeling operation. Use the mouse to select and display details for any graph value.

Filters

Filters at the top of the performance dashboard allow you to analyze relevant subsets of data. When active, filters apply to all metric views, including Throughput, Efficiency, and Quality. Currently, the following filters are provided:
FilterDescription
BatchFilters the graphs and metrics by data rows belonging to a specific batch.
Label actions - Labeled byFilters the graphs and metrics by data rows that have been labeled by a specific labeler.
Label actions - DeletedFilters the graphs and metrics by data rows based on their deletion status.
- Exclude: Excludes the data rows with labels that have been deleted.
- Include: Includes all labels created in the project, including ones with deleted labels.

Throughput view

The Throughput view provides insight into the amount of labeling work being produced and helps you answer questions like: The metrics shown above are available for all members of the project and for individual members. Performance Dashboard Throughput View Pn The throughput view displays the following metrics:
ChartDescription
DoneDisplays the daily count of data rows in the Done step of the project workflow. See Workflows for more definitions.
LabelsDisplays the count of labeled data rows, including labels that were later deleted. For Benchmark or Consensus data rows, a single data row can have multiple labels, so the Labels count may exceed the number of data rows in the same period.
AnnotationsDisplays the count of annotations created, including labels that wer later deleted.
ReviewsCount of “Approve” and “Reject” actions for labels created in the project. For information on approve and reject actions in the review queue, see Workflows.
Total timeDisplays the total time spent (inclusive of Labeling, Review and Rework time) on labels created in the project.
Labeling timeDisplays the total labeling time spent on data rows, while the timer is on.
Review timeDisplays the total review time spent on labeled data rows, while the timer is on.
Rework timeDisplays the total rework time spent on labeled data rows,, while the timer is on.

Tracking states for timing

A data row goes through many different states, which are tracked in the timer log.

Labeling time

Labeling time increments when:
  • The user skips or submits the asset in the labeling queue

Review time

Review time increments when:
  • The user views the asset in the data row browser view
  • The user views the asset in the review queue

Rework time

Rework time increments when:
  • The user submits the asset in the rework queue
  • The user navigates to the review queue, selects edit, and approves/rejects the asset
  • The user edits and saves the asset in the data row browser view

Inactivity

Only active screen time is recorded. Inactivity pauses the timer. When your system is idle for more than five (5) minutes, the timer pauses. This affects labeling time, review time, and rework time. The timer resumes when system activity resumes.

Efficiency view

The efficiency view helps you visualize the time spent per unit of work, per labeled asset, or per review. These metrics help answer questions such as:
  • What is the average amount of time spent labeling an asset?
  • How can I reduce time spent per labeled asset?
These metrics are available for individual project members and at the project level (for all project members). Performance Tab Efficiency View Pn The efficiency view includes these charts:
ChartDescription
Avg time per labelDisplays the average labeling time spent per label. Avg time per label = Total labeling time / number of labels submitted
Avg review timeDisplays the average review time per data row. Avg review time = Total review time / number of data rows reviewed
Avg rework timeDisplays the average rework time per data row. Avg rework time = Total rework time / number of data rows reworked
AHT per labeled data rowDisplays the total time across all modes for all data rows divided by the number of data rows.
AHT per done data rowDisplays the total time across all modes for data rows in the ‘Done’ task queue divided by the number of data rows in ‘Done’.
AHT per created labelDisplays the total time across all modes for ‘Created’ labels divided by the number of ‘Created’ labels.
AHT per submitted labelDisplays the total time across all modes for ‘Submitted’ labels divided by the number of ‘Submitted’ labels
AHT per done labelDisplays the total time across all modes for labels in the ‘Done’ task queue divided by the number of labels in ‘Done’.
Label states in time calculations:
  • “Skipped” time is excluded from labeling time.
  • “Deleted” label time may be included or excluded using filters.
  • “Abandoned” (unsaved) work is counted toward labeling time.

Quality view

The quality view helps you understand the accuracy and consistency of the labeling work being produced. These metrics answer questions like:
  • What is the average quality of a labeled asset?
  • How can I ensure label quality is more consistent across the team?
These metrics are available for individual project members and for the project itself, which summarizes performance for all project members. The metrics shown are available for each individual and for the project level, which summarizes the performance of members of the project. The quality view includes the following charts: Below are the descriptions for each chart in the quality view.
MetricDescription
BenchmarkShows the average benchmark scores on labeled data rows within a specified time frame.
Benchmark distributionShows a histogram of benchmark scores (grouped by 10) for labeled assets within a specified time frame.
ConsensusShows the average consensus score of labeled assets over a selected period. This is the average agreement score of consensus labels within themselves (for a given data row).
Consensus distributionShows a histogram of consensus scores (grouped by 10) for labeled assets plotted over the selected period.
Graphs display the components appropriate for your project. Neither benchmarks nor consensus display when they’re not enabled for your project.

Individual member performance

You can also view individual metrics for each team member that has worked on the project. The performance metrics are separated by intent ( labeling and reviewing) and are shown as distinct views in the table.
Team members are listed individually in separate rows; members only appear if they actively performed tasks during the selected period; that is, they only show up if they labeled data rows or reviewed labels. Labeling metrics include:
Metric (Labeling)Description
Labels createdThe number of labels created by team members during the selected period.
Labels skippedThe number of labels skipped by team members during the selected period.
Labeling time (submitted)Total labeling time team member spent creating and submitting labels during the selected period.
Labeling time (skipped)Total labeling time team member spent working on labels that were ultimately skipped. (The Skip button was clicked.)
Avg time per labelAverage labeling time of submitted and skipped labels. Calculated by dividing total labeling time by the number of labels submitted or skipped. Displays N/A when no labels have been submitted or skipped.
Reviews receivedThe number of Workflow review queue actions ( Approve and Reject) received during the selected period on data rows labeled by the team member.
Avg review time (all)Average review time on labels created by the team member. Calculated by dividing the total review time for all team members by the number of labels that have been reviewed.
Avg rework time (all)Average rework time on labels created by the team member. Calculated by dividing the total time spent by all team members and the number of labels reworked.
Rework time (all)Total rework time spent on labels created by the team member in the selected period.
Review time (all)Total review time spent on labels created by the team member during the selected period.
Rework %Percentage of labels created by the team member that were reworked during the select time period. Calculated by dividing the number of labels created by the team member that had any rework done by the number of labels created by the team member.
Approval %Percentage of data rows labeled by the team member that were approved during the selected period. Calculated by dividing the number of data rows with one or more approved labels by the number of review actions (Approve or Reject). Labels pending review are not included.
Benchmark scoreAverage agreement score with benchmark labels for labels created by the team member during the selected period.
Consensus scoreAverage agreement score with other consensus labels for labels created by the team member during the selected period.
These metrics that appear on the Reviewing tab:
Metric (Reviewing)Description
Data rows reviewedData rows that have had review time spent by the user (per filter selection).
Data rows reworkedData rows that have had rework time spent by the user.
Avg review timeAverage review time spent by the user in the period selected. Numerator = Total review time spent by the user Denominator = Number of labels reviewed by the user
Avg rework timeAverage rework time spent by the user in the selected period. Numerator = Total rework time spent by the user Denominator = Number of labels reworked (approved or rejected) by the user
Total timeSum of review time and rework time spent by the user during the period selected.
Rework %Percentage (%) of data rows reviewed by the user that have been reworked. Numerator = Number of labels that have had both rework time and review time spent by the user. Denominator = Number of data rows that have had review time spent by the user.
Approval %Data rows with an approve action by the user as a percentage (%) of data rows with any review action by the user. Numerator = Number of data rows with approve action by the user (in workflow). Denominator = Number of data rows with either an approve or reject action by the user (in workflow).

Instruction Quiz analytics

The Instruction Quiz tab provides comprehensive analytics for quiz performance when you have enabled quizzes in your ontology instructions. These analytics help you understand how well labelers comprehend your instructions and identify areas where additional clarification may be needed.

Set up quizzes first

Before you can view quiz analytics, you need to create a quiz for your ontology. See Add quizzes to ontology instructions to learn how to create and configure quizzes.

Access quiz analytics

To view quiz analytics for your project:
  1. Navigate to your project
  2. Go to the Performance tab
  3. Select the Instruction Quiz tab at the top of the page
  4. Use the date range picker to filter analytics by time period

Overview metrics

The dashboard displays key performance indicators at the top:
MetricDescription
Total Quiz AttemptsThe total number of times labelers have taken the quiz during the selected period.
Overall Pass RatePercentage of quiz attempts that achieved a passing score (3 out of 5 or higher).
Average ScoreThe mean score across all quiz attempts on a 1-5 scale.
Unique UsersNumber of distinct labelers who have attempted the quiz.
Total QuestionsNumber of questions in the current quiz.
Avg. Time to PassAverage time from first attempt to first passing attempt (excludes time spent on the first attempt itself).
Avg. Attempts to PassAverage number of quiz attempts needed for labelers to pass the quiz for the first time.

Visual analytics

The dashboard includes two key distribution charts: Pass Attempt Distribution
  • Shows how many attempts labelers need to pass the quiz
  • Helps identify if the quiz difficulty is appropriate
  • Example: If most labelers pass on the first attempt, the quiz may be too easy; if most need many attempts, it may be too difficult
Score Distribution
  • Displays the range of scores across all attempts
  • Shows how scores are distributed on the 1-5 scale
  • Helps identify if most labelers are performing well or struggling

Question performance table

This table shows detailed metrics for each quiz question:
MetricDescription
QuestionThe full text of the quiz question. Questions marked “Past question” are from previous quiz versions.
Avg. ScoreAverage score (1-5 scale) for this question across all attempts.
Avg. Attempts to PassAverage number of attempts needed for labelers to answer this question correctly.
Improvement TrendScore difference between first and last attempts, showing if labelers improve over time.
Avg. Time to PassAverage time spent to successfully pass this question (displayed as MM:SS).
Use this table to identify problematic questions:
  • Questions with low average scores may need clearer instruction content
  • Questions requiring many attempts suggest the topic needs better explanation
  • Negative improvement trends indicate labelers aren’t learning from retakes

User performance table

This table shows individual labeler performance:
MetricDescription
EmailThe labeler’s email address.
AttemptsTotal number of quiz attempts, with count of passed attempts in parentheses.
Avg. ScoreAverage score across all of this user’s attempts (1-5 scale).
Improvement TrendScore difference between user’s first and last attempt.
Pass RatePercentage of this user’s attempts that achieved a passing score.
Time to PassTime from first attempt to first successful pass (displayed as MM:SS).
Quiz StatusShows “Passed” if the user has successfully passed at least once, “Not Passed” otherwise.
Use this table to:
  • Identify labelers who may need additional training or support
  • Track improvement over time for individual users
  • Understand which labelers are struggling with the material

Using analytics to improve your quiz

Based on the analytics data, you can take several actions: If overall pass rates are low:
  • Review your instructions for clarity and completeness
  • Consider breaking down complex concepts into simpler explanations
  • Add more examples to illustrate key points
If specific questions have low scores:
  • Revise the related section in your instructions
  • Ensure the question accurately tests the intended knowledge
  • Adjust the expected answer to be more flexible
If labelers need many attempts to pass:
  • Simplify your quiz questions or make instructions more explicit
  • Add practice examples in your instructions
  • Consider reducing the passing threshold if appropriate
If improvement trends are negative:
  • Review the feedback provided by the AI scoring
  • Ensure questions test understanding, not memorization
  • Consider whether the quiz is testing the right concepts

Import timer impact

Importing labels affects labeling time. When you import:
  • Ground truth labels: no label time is recorded on the Data rows tab or the performance dashboard for that label. Label time is displayed as a zero (0).
    • When team members modify the label, time is recorded as review time
  • For model-assisted learning (MAL) pre-labels, no label time is recorded on the Data rows tab or the performance dashboard for that label. Label time is displayed a zero (0).
  • When team members open the data row in the editor and click Edit, the time spent before selecting Skip or Submit is recorded as labeling time