Requirements
- Databricks: Runtime 10.4 LTS or Later
- Apache Spark: 3.1.2 or Later
- Labelbox account
- Generate a Labelbox API key
Setup
Set up LabelSpark with the following lines of code:
%pip install labelspark -q
import labelspark as ls
api_key = "" # Insert your Labelbox API key here
client = ls.Client(api_key)
Once set up, you can run the following core functions:
-
client.create_data_rows_from_table()
: Creates Labelbox data rows (and metadata) given a Spark Table DataFrame -
client.export_to_table()
: Exports labels (and metadata) from a given Labelbox project and creates a Spark DataFrame
Import data
Tutorial | Github |
---|---|
Basics: Data rows from URLs | Open in Github |
Data rows with metadata | Open in Github |
Data rows with attachments | Open in Github |
Data rows with annotations | Open in Github |
Putting it all together | Open in Github |
Export Data
Tutorial | Github |
---|---|
Export data to a spark table | Open in Github |
While using LabelSpark, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:
- All Labelspark notebook examples
- Labelbox API reference