AI critic is supported for the Multi-modal chat editor, the Audio editor, and the Video editor.
How to create a new critic
- Go to your Project settings → Advanced and enable AI critic. You can also choose which Foundry model to power the AI critic by setting the AI critic model.
- In the editor, select the AI critic setup icon in the top navbar. Then, click Add new critic.
- Add a short descriptive title for your critic. Then, in the text box, describe your critic using natural language. Define which elements you want to critique and the specific criteria they must meet. The more specific your description, the better the critic performs. For the best results, make sure your prompt references these element names:
| Element | Description |
|---|---|
| Actor | A participant in the conversation - either a “human” (the prompter) or a “model” (an AI responder). Each actor has metadata like a display name or model configuration name. |
| Checklist classification | A multi-select classification where labelers choose one or more options from a list. Like radios, options can contain nested sub-questions. |
| Display rules | Conditional visibility logic that controls when a classification appears. A criterion with display rules only shows when other criteria have specific values, enabling branching evaluation workflows. |
| Feature | The actual value a labeler selected or entered for a classification. Contains the answer data, including which option was chosen (for radio/checklist) or what text was entered. |
| Feature schema | The definition or blueprint for a classification. Describes its type (radio, checkbox, text), available options, scope, and display rules. Think of it as the question template. |
| Index data | Binds a scoped classification to a specific element in the conversation. For example, a response-message-scoped classification’s index data contains the ID of the response it applies to. |
| Prompt message | A message sent by the human participant that initiates a conversation turn. In multi-turn conversations, each new prompt builds on the previous exchange. |
| Radio classification | A single-select classification where labelers choose exactly one option from a list. Options can have nested sub-questions that appear when selected. |
| Response message | A message generated by an AI model in reply to a prompt. In multi-model evaluation, a single prompt can have multiple responses from different models. |
| Rubric criterion | A structured evaluation question used to assess model responses. Can be a radio (single-select), checkbox (multi-select), or text (free-form) question. Each criterion is associated with a specific prompt message and evaluates the responses to that prompt. |
| Rubric group | A container that organizes related rubric criteria under a common header. Groups can have min/max constraints on how many criteria they contain. |
| Scope | Determines where a classification applies. “Global” applies to the entire conversation. Other scopes target a specific prompt message, response message, rubric criterion, or turn. |
| Text classification | A free-form text input annotation where labelers enter arbitrary text. |
| Turn | A complete prompt-response exchange in the conversation. Each turn contains one prompt message and one or more response messages from model actors. |
- Set the Enforcment. This setting determines what happens when a critique fails. Select from the following options.
- Block: Prevents the user from submitting the label until the failing critique is resolved. This is the strictest quality-control option.
- Acknowledge: When the user clicks submit, a notification appears detailing the quality failures. The user must acknowledge the message to proceed, but they are not blocked from submitting.
- None: Displays the critique result as a visual aid directly in the editor but does not interfere with the submission process.
- Click Update Critic to save it. The AI will immediately begin critiquing elements based on your new rule.
How to write effective critiques
You can create critiques for various elements within the editor. The AI understands the context and relationships between different elements, such as a classification and the message it is attached to. To verify that a classification’s value is correct based on its label and the message it’s associated with, you can use a prompt like this:Look at the instructions or label for each classification. Make sure the value for the classification is correct based on its label and all of the contextual information. If it's a scoped classification, make sure to read the associated message in order to make your critique.
The AI critic will then:
- Target all classifications in the editor.
- Read the classification’s label.
- If the classification is attached to a message, it will read the message content.
- Determine if the selected value is appropriate based on the context.
Example prompts
- “All responses must be at least 3 sentences long and written in formal English. Responses that are too short or use casual language should fail.”
- “Each response must directly address the question asked in the prompt message. If the response is off-topic or does not answer the question, it should fail.”
- “For all checklist classifications, ensure that no contradictory options have been selected together.”
- “For all radio classifications scoped to response messages, verify the selected option accurately reflects the tone of the response.”
- “Review the entire conversation holistically. The overall exchange must feel natural and coherent. Flag data rows where the conversation feels disjointed or artificially constructed.”
- Ensure that no personally identifiable information (PII) appears anywhere in the conversation — in prompts, responses, or classifications.”