Audio

With the audio editor, you can add annotations to audio files, like classifying natural language conversations and music, to train conversational AI and audio-based ML models. The editor supports automatic speech-to-text recognition with the Whisper model, enabling you to transcribe any audio segment.

Set up audio annotation projects

To set up an audio annotation project:

Create an audio dataset.
On the Annotate projects page, click the + New project button.
Select Audio. Provide a name and an optional description for your project.
Click Save. The system then creates the project and redirects you to the project overview page.
Click Add data. Then select your audio dataset. Click Sample to sample your dataset, or you can manually select data rows and click Queue batch.

Data row size limit

To view the maximum size allowed for a data row, see limits.

Set up ontologies

After setting up an audio annotation project, you can add an ontology based on how you want to label the data. The audio editor supports the following annotation types that you can include in your ontology:

Feature	Import annotations	Export annotations
Radio classification	See payload	See payload
Checklist classification	See payload	See payload
Free-form text classification	See payload	See payload

Classification scopes

You can apply classifications as global classifications at the file level, temporal classifications at the frame level, or nested classifications under other annotations.

Use the audio editor

After adding data and setting up an ontology for your audio annotation project, you can add labels to data rows using the audio editor. Each data row displays in the editor with:

A waveform visualizing pattern of sound pressure variation.
A spectrogram showing the range of sound frequencies and their strengths over time.
A timeline of audio split into 500 millisecond intervals by default and a Timeline Resolution slider that allows you to adjust the time intervals on the timeline.
Basic player controls, such as the play/pause button, back/forward 10-second buttons, and the playback speed. You can also click anywhere on the waveform to instantly move to your desired location.

To add a global classification. select the classification and enter the value. To add a temporal classification, select the classification, choose the interval on the timeline or waveform for when the classification starts, and add the classification value. You will see a circle representing the classification value on the timeline.

Timeline resolution differences for labels

If you set a lower resolution with the timeline resolution slider, the classification label you add may not align exactly with the current timeline resolution. This indicates that the classification was placed at a timestamp with a higher resolution than the one currently being used. You can adjust the timeline resolution to a higher resolution to see the exact position of the classification.

Enable speech recognition

The in-editor automatic speech-to-text support allows you to recognize and extract text from audio segments using free-form text classifications. To enable it:

When creating the ontology, add a temporal Text classification feature.
Select the temporal text classification, and then select a starting frame on the timeline.
Click START TRANSCRIBING.
Click END TRANSCRIBING at your desired ending frame.

If the system detects speech, it automatically generates a transcript in the text annotation.

Keyboard shortcuts

Function	Hotkey	Description
Play/Pause	`Space`	Play or pause the audio playback
Move backward one frame	`←`	Move backward one frame
Move forward one frame	`→`	Move forward one frame
Select frames	`Shift` + `Mouse`	Select frames for adding temporal classifications
Advance to the previous keyframe	`⇧` + `←`	Advance to the previous keyframe
Advance to the next keyframe	`⇧` + `→`	Advance to the next keyframe
Jump to objects	`Down`	Jump to objects
Next object	`Down`	Move to the next object
Previous object	`Up`	Move to the previous object
Toggle	`⌘` + `/`	Toggle the keyboard shortcuts menu

Getting Started

Labeling Services

Annotate

Model

Catalog

Schema

Export

Integrations

Manage Team

Access & Usage

Updates

Set up audio annotation projects

Data row size limit

Set up ontologies

Classification scopes

Use the audio editor

Timeline resolution differences for labels

Enable speech recognition

Keyboard shortcuts

Getting Started

Labeling Services

Annotate

Model

Catalog

Schema

Export

Integrations

Manage Team

Access & Usage

Updates

​Set up audio annotation projects

​Data row size limit

​Set up ontologies

​Classification scopes

​Use the audio editor

​Timeline resolution differences for labels

​Enable speech recognition

​Keyboard shortcuts

Set up audio annotation projects

Data row size limit

Set up ontologies

Classification scopes

Use the audio editor

Timeline resolution differences for labels

Enable speech recognition

Keyboard shortcuts