Learn how to create offline multimodal chat evaluation projects for ranking and classifying model outputs on conversation text data.
Feature | Description | Export format |
---|---|---|
Message ranking | Rank multiple model-generated responses to determine their relative quality or relevance. | Payload |
Message selection | Select single or multiple responses that meet specific criteria. | Payload |
Message step reasoning | Break responses into steps and evaluate the accuracy of each step by selecting from correct, neutral, and incorrect. Add your rewrite with justification for incorrect steps. | Payload |
Classification - Radio | Select one option from a predefined set. | Payload |
Classification - Checklist | Choose multiple options from a list. | Payload |
Classification - Free text | Add free text annotations. | Payload |