AI critic - Labelbox

AI Critic helps you ensure the quality of your data by allowing you to define validation rules using natural language. The AI is contextually aware of all elements on the data row—including messages, features, and rubrics—enabling you to create highly specific critiques.

AI critic is supported for the Multi-modal chat editor, the Audio editor, and the Video editor.

How to create a new critic

Go to your Project settings → Advanced and enable AI critic. You can also choose which Foundry model to power the AI critic by setting the AI critic model.
In the editor, select the AI critic setup icon in the top navbar. Then, click Add new critic.
Add a short descriptive title for your critic. Then, in the text box, describe your critic using natural language. Define which elements you want to critique and the specific criteria they must meet. The more specific your description, the better the critic performs. For the best results, make sure your prompt references these element names:

Element	Description
Actor	A participant in the conversation - either a “human” (the prompter) or a “model” (an AI responder). Each actor has metadata like a display name or model configuration name.
Checklist classification	A multi-select classification where labelers choose one or more options from a list. Like radios, options can contain nested sub-questions.
Display rules	Conditional visibility logic that controls when a classification appears. A criterion with display rules only shows when other criteria have specific values, enabling branching evaluation workflows.
Feature	The actual value a labeler selected or entered for a classification. Contains the answer data, including which option was chosen (for radio/checklist) or what text was entered.
Feature schema	The definition or blueprint for a classification. Describes its type (radio, checkbox, text), available options, scope, and display rules. Think of it as the question template.
Index data	Binds a scoped classification to a specific element in the conversation. For example, a response-message-scoped classification’s index data contains the ID of the response it applies to.
Prompt message	A message sent by the human participant that initiates a conversation turn. In multi-turn conversations, each new prompt builds on the previous exchange.
Radio classification	A single-select classification where labelers choose exactly one option from a list. Options can have nested sub-questions that appear when selected.
Response message	A message generated by an AI model in reply to a prompt. In multi-model evaluation, a single prompt can have multiple responses from different models.
Rubric criterion	A structured evaluation question used to assess model responses. Can be a radio (single-select), checkbox (multi-select), or text (free-form) question. Each criterion is associated with a specific prompt message and evaluates the responses to that prompt.
Rubric group	A container that organizes related rubric criteria under a common header. Groups can have min/max constraints on how many criteria they contain.
Scope	Determines where a classification applies. “Global” applies to the entire conversation. Other scopes target a specific prompt message, response message, rubric criterion, or turn.
Text classification	A free-form text input annotation where labelers enter arbitrary text.
Turn	A complete prompt-response exchange in the conversation. Each turn contains one prompt message and one or more response messages from model actors.

Set the Enforcement. This setting determines what happens when a critique fails. Select from the following options.
- Block: Prevents the user from submitting the label until the failing critique is resolved. This is the strictest quality-control option.
- Acknowledge: When the user clicks submit, a notification appears detailing the quality failures. The user must acknowledge the message to proceed, but they are not blocked from submitting.
- None: Displays the critique result as a visual aid directly in the editor but does not interfere with the submission process.
Click Update Critic to save it. The AI will immediately begin critiquing elements based on your new rule.

How to write effective critiques

You can create critiques for various elements within the editor. The AI understands the context and relationships between different elements, such as a classification and the message it is attached to. To verify that a classification’s value is correct based on its label and the message it’s associated with, you can use a prompt like this:

Look at the instructions or label for each classification. Make sure the value for the classification is correct based on its label and all of the contextual information. If it's a scoped classification, make sure to read the associated message in order to make your critique.

The AI critic will then:

Target all classifications in the editor.
Read the classification’s label.
If the classification is attached to a message, it will read the message content.
Determine if the selected value is appropriate based on the context.

Example prompts

“All responses must be at least 3 sentences long and written in formal English. Responses that are too short or use casual language should fail.”
“Each response must directly address the question asked in the prompt message. If the response is off-topic or does not answer the question, it should fail.”
“For all checklist classifications, ensure that no contradictory options have been selected together.”
“For all radio classifications scoped to response messages, verify the selected option accurately reflects the tone of the response.”
“Review the entire conversation holistically. The overall exchange must feel natural and coherent. Flag data rows where the conversation feels disjointed or artificially constructed.”
Ensure that no personally identifiable information (PII) appears anywhere in the conversation — in prompts, responses, or classifications.”

Important considerations

Be specific, vague prompts produce inconsistent results. Use standard vocabulary — terms like “turn,” “rubric criterion,” “global scope,” “response message scope” help the AI target the right elements. Experiment with your prompts — writing a good critic prompt is an art; test different phrasings and check results in the editor to find what works.

​How to create a new critic

​How to write effective critiques

​Example prompts

​Important considerations

How to create a new critic

How to write effective critiques

Example prompts

Important considerations