How to import annotations on audio data and sample import formats.
Open this Colab for an interactive tutorial on importing annotations on audio data.
Supported annotations
To import annotations in Labelbox, you need to create the annotations payload. In this section, we provide this payload for every annotation type.
Labelbox supports two formats for the annotations payload:
- Python Annotation types (recommended)
- NDJSON
Both are described below.
Classification: Free-form text
text_annotation = lb_types.ClassificationAnnotation(
name="text_audio",
value=lb_types.Text(answer="free text audio annotation"),
)
text_annotation_ndjson = {
'name': 'text_audio',
'answer': 'free text audio annotation',
}
Classification: Checklist (Multi-choice)
checklist_annotation= lb_types.ClassificationAnnotation(
name="checklist_audio",
value=lb_types.Checklist(
answer = [
lb_types.ClassificationAnswer(
name = "first_checklist_answer"
),
lb_types.ClassificationAnswer(
name = "second_checklist_answer"
)
]
),
)
checklist_annotation_ndjson = {
'name': 'checklist_audio',
'answers': [
{'name': 'first_checklist_answer'},
{'name': 'second_checklist_answer'}
]
}
Classification: Radio (Single-choice)
radio_annotation = lb_types.ClassificationAnnotation(
name="radio_audio",
value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
name="second_radio_answer")))
radio_annotation_ndjson = {
'name': 'radio_audio',
'answer': {
'name': 'first_radio_answer'
}
}
End-to-end example: Import pre-labels or ground truth
Whether you are importing annotations as pre-labels or as ground truth, the steps are very similar. Steps 5 and 6 (creating and importing the annotation payload) is where the process becomes slightly different and is explained below in detail.
Before you start
You will need to import these libraries to use the code examples in this section.
import labelbox as lb
import uuid
import labelbox.types as lb_types
Replace with your API key
API_KEY = ""
client = lb.Client(API_KEY)
Step 1: Import data rows into Catalog
global_key = "sample-audio-1.mp3"
asset = {
"row_data": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-1.mp3",
"global_key": global_key
}
dataset = client.create_dataset(name="audio_annotation_import_demo_dataset")
task = dataset.create_data_rows([asset])
task.wait_till_done()
print("Errors:", task.errors)
print("Failed data rows: ", task.failed_data_rows)
Step 2: Create/select an ontology
Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool names and classification instructions should match the name fields in your annotations to ensure the correct feature schemas are matched.
For example, when we create the text annotation, we provided the name text_audio
. Now, when we set up our ontology, we must ensure that the name
of the tool is also text_audio
. The same alignment must hold true for the other tools and classifications we create in our ontology.
ontology_builder = lb.OntologyBuilder(
classifications=[
lb.Classification(
class_type=lb.Classification.Type.TEXT,
name="text_audio"),
lb.Classification(
class_type=lb.Classification.Type.CHECKLIST,
name="checklist_audio",
options=[
lb.Option(value="first_checklist_answer"),
lb.Option(value="second_checklist_answer")
]
),
lb.Classification(
class_type=lb.Classification.Type.RADIO,
name="radio_audio",
options=[
lb.Option(value="first_radio_answer"),
lb.Option(value="second_radio_answer")
]
)
]
)
ontology = client.create_ontology("Ontology Audio Annotations",
ontology_builder.asdict(),
media_type=lb.MediaType.Audio)
Step 3: Create a labeling project
Connect the ontology to the labeling project.
#Create Labelbox project
project = client.create_project(name="audio_project",
media_type=lb.MediaType.Audio)
# Setup your ontology
project.setup_editor(ontology)
Step 4: Send a batch of data rows to the project
# Create a batch to send to your MAL project
batch = project.create_batch(
"first-batch-audio-demo", # Each batch in a project must have a unique name
global_keys=[global_key], # Paginated collection of data row objects, list of data row ids or global keys
priority=5 # priority between 1(Highest) - 5(lowest)
)
print("Batch: ", batch)
Step 5: Create the annotations payload
Create the annotations payload using the snippets of code shownabove.
Labelbox supports two formats for the annotations payload: NDJSON and Python annotation types. Both approaches are described below with instructions to compose annotations into Labels attached to the data rows.
The resulting label
and label_ndjson
from each approach will include every annotation (created above) supported by the respective method.
label = []
label.append(
lb_types.Label(
data=lb_types.AudioData(
global_key=global_key
),
annotations=[
text_annotation,
checklist_annotation,
radio_annotation
]
)
)
label_ndjson = []
for annotations in [text_annotation_ndjson,
checklist_annotation_ndjson,
radio_annotation_ndjson]:
annotations.update({
'dataRow': {
'globalKey': global_key
}
})
label_ndjson.append(annotations)
Step 6: Upload annotations to a project as pre-labels or ground truth
For both options, you can pass either the label
or label_ndjson
payload as the value for the predictions or labels parameter.
Option A: Upload to a labeling project as pre-labels (Model-assisted labeling)
upload_job = lb.MALPredictionImport.create_from_objects(
client = client,
project_id = project.uid,
name=f"mal_job-{str(uuid.uuid4())}",
predictions=label)
upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)
Option B: Upload to a labeling project as ground truth
upload_job = lb.LabelImport.create_from_objects(
client = client,
project_id = project.uid,
name="label_import_job"+str(uuid.uuid4()),
labels=label)
upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)