Import audio data

How to import audio data and sample import formats.

Specifications

File formats: MP3, WAV, M4A
Import methods:

  • IAM Delegated Access
  • Signed URLs (https URLs only)
  • Direct upload of local files(256 MB max file size)
    Note: Direct upload currently does not support adding additional metadata and attachments, see below Python example.

When importing audio data to Labelbox, your JSON file must include the following information for each audio file.

Parameters

ParameterRequiredDescription
row_dataYeshttps path to a cloud-hosted audio. For IAM Delegated Access, this URL must be in virtual-hosted-style format. For older regions, your S3 bucket may be in https://<bucket-name>.s3.<region>.amazonaws.com/<key> format. If your object URLs are formatted this way, make sure they are in the virtual-hosted-style format before importing.
global_keyNoUnique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if its global keys are duplicated to existing data rows.
media_typeNo"AUDIO" (optional media type to provide better validation and error messaging)
metadata_fieldsNoSee Metadata.
attachmentsNoSee Attachments and Asset overlays.

Import format

[
  {
    "row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/audio-samples/sample-audio-1.mp3",
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/audio-samples/sample-audio-1.mp3",
    "media_type": "AUDIO",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "VIDEO", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/drone_video.mp4" }]
  },
  {
    "row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/audio-samples/sample-audio-2.mp3",
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/audio-samples/sample-audio-2.mp3",
    "media_type": "AUDIO",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  }
]
[
  {
    "row_data": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-1.mp3",
    "global_key": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-1.mp3",
    "media_type": "AUDIO",
    "metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
    "attachments": [{"type": "VIDEO", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/drone_video.mp4" }]
  },
  {
    "row_data": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-2.mp3",
    "global_key": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-2.mp3",
    "media_type": "AUDIO",
    "metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  }
]

Python example

from labelbox import Client
from uuid import uuid4 ## to generate unique IDs
import datetime 

client = Client(api_key="<YOUR_API_KEY>")

dataset = client.create_dataset(name="Bulk import example - Audio")

assets = [
  {
    "row_data": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-1.mp3",
    "global_key": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-1.mp3",
    "media_type": "AUDIO",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "VIDEO", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/drone_video.mp4" }]
  },
  {
    "row_data": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-2.mp3",
    "global_key": "https://storage.googleapis.com/labelbox-datasets/audio-sample-data/sample-audio-2.mp3",
    "media_type": "AUDIO",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  }
]

task = dataset.create_data_rows(assets)
task.wait_till_done()
print(task.errors)
local_file_paths = ['path/to/local/file1', 'path/to/local/file1'] # limit: 15k files


new_dataset = client.create_dataset(name = "Local files upload")

try:
    task = new_dataset.create_data_rows(local_file_paths)
    task.wait_till_done()
except Exception as err:
    print(f'Error while creating labelbox dataset -  Error: {err}')