The multimodal chat editor now supports offline multimodal chat evaluation projects. To learn how this works, see our Multi-modal chat evaluation docs.
The Multimodal chat evaluation editor now supports LaTeX math expressions. To learn more, see our Multi-modal chat evaluation docs.
Enterprise customers will see an updated flow for requesting labeling services in the app UI.
You can now use the Benchmarks & Consensus tools for prompt & response generation projects.
You can now use the Benchmarks & Consensus tools for video projects.
Benchmarks & Consensus are now enabled for the following workflows: Send data rows from Model to Annotate, send data rows from Annotate to a model experiment, and Bulk classification in Catalog.
In the Data rows tab in Annotate, you will see an improved search and filtering experience for data rows with Benchmarks.
The audio editor now supports temporal classifications and a timeline, so you can assign classifications to specific points in the audio timeline.
When you compare model runs in the metrics and cluster view in Model, it will only display the metrics calculations for the data rows that appear in both model runs.
Changed
You can now configure your projects to use Benchmarks & Consensus simultaneously. If you have a project that has already been set up with Benchmarks, you can enable Consensus later in the project settings. However, it is still not possible to disable once set.
In the Data row browser, you can now set a Consensus label as a Benchmark label. You can do so by selecting a data row in the Data rows tab that has a Consensus label and select Add as Benchmark. This will calculate both the Benchmark and Consensus score for the data row.
The left side panel is now resizable across all editors.
The default reservation count for the labeling queue has been updated from 10 to 4. Admins can adjust the queue parameters in the project settings.
The Auto-segment tool in the image editor has been upgraded to use SAM 2.
Removed
The custom editor has been sunsetted for all customers.
The following data connector libraries have been archived: labelbase, labelspark, labelpandas, labelsnow, labelbox-bigquery. To learn more, see our Deprecations page.
The External workforce tab has been removed from the project settings.