Model-assisted labeling allows you to upload a predicted label to a Data Row. It works very similar to Label, the major difference is that you will be using MALPredictionImport
module instead of LabelImport
module.
Create Model-assisted Labels with Annotation Types (Recommended)
When you bulk upload Model-assisted Labels via Annotation Type, you will create a Label list that contains a list of Label
s, each of which is constructed by a Data (constructed from Data Row ids) and a list of Annotations.
Here are the Python annotation types that are supported for the Labels and Model-assisted Labels creation.
Python annotation type | Image | Video | Text | Tiled imagery |
---|---|---|---|---|
Bounding box | ✓ | - | N/A | ✓ |
Polygon | ✓ | - | N/A | ✓ |
Point | ✓ | - | N/A | ✓ |
Polyline | ✓ | - | N/A | ✓ |
Segmentation mask | ✓ | - | N/A | - |
Entity | N/A | N/A | ✓ | N/A |
Relationship | - | - | - | - |
Radio | ✓ | ✓ | ✓ | ✓ |
Checklist | ✓ | ✓ | ✓ | ✓ |
Free-form text | ✓ | - | ✓ | ✓ |
Import relevant modules for your data type and annotation types
from labelbox import Client, MALPredictionImport
from labelbox.data.serialization import NDJsonConverter
# For working with images, videos, text and documents
from labelbox.data.annotation_types import (
Label, ImageData, MaskData, LabelList, TextData, VideoData,
ObjectAnnotation, ClassificationAnnotation, Polygon, Rectangle, Line, Mask,
Point, Checklist, Radio, Text, TextEntity, ClassificationAnswer)
## For working with geospatial data
from labelbox.data.annotation_types.data.tiled_image import TiledBounds, TiledImageData, TileLayer, EPSG, EPSGTransformer
Create a Model-assisted Label and upload it to a project
Here is a simple example of creating a model-assisted Label with an ImageData and an Annotation.
client = Client(api_key="<YOUR_API_KEY>")
# 1. Make sure the project has the right ontology for the Label's annotations.
# Here we will create a new project to show the ontology creation, you can also do it via the App.
project = client.create_project(name="test_label_import_project")
dataset = client.create_dataset(name="image_annotation_import_demo_dataset")
test_img_url = "https://raw.githubusercontent.com/Labelbox/labelbox-python/develop/examples/assets/2560px-Kitano_Street_Kobe01s5s4110.jpg"
data_row = dataset.create_data_row(row_data=test_img_url)
project.datasets.connect(dataset)
# Create ontology that matches the labels' annotation, in this example, we only need a bounding box.
ontology_builder = OntologyBuilder(tools=[
Tool(tool=Tool.Type.BBOX, name="box")
])
ontology = client.create_ontology("bbox ontology", ontology_builder.asdict())
# Attach ontology to project
project.setup_editor(ontology)
# 2. Create annotation(s)
rectangle = Rectangle(start=Point(x=30,y=30), end=Point(x=200,y=200))
# Note this Annotation matches with the ontology's feature box by name
rectangle_annotation = ObjectAnnotation(value=rectangle, name="box")
# 3. Create a Label with a list of annotations associated with the data row.
annotations_list = [rectangle_annotation]
data = ImageData(uid = data_row.uid)
label = Label(data= data, annotations = annotations_list)
# 4. Upload the Label to project as Model assisted Label
label_list = LabelList()
label_list.append(label)
labels_ndjson = list(NDJsonConverter.serialize(label_list))
upload_job = MALPredictionImport.create_from_objects(
client = client,
project_id = project.uid,
name="upload_mal_import_job",
predictions=labels_ndjson)
print("Errors:", upload_job.errors)
Bulk import Labels
This example creates a bounding box label on each of the queued Data Rows in your project.
Configure the ontology for your project
Each Annotation of your Model-assisted Label must correspond to a Feature inside the ontology of your project. You can configure project ontology in the app, or via SDK.
Construct a LabelList
## Get a list of unlabeled Data Rows to import Labels
project = client.get_project("<YOUR_PROJECT_ID>")
queued_data_rows = project.export_queued_data_rows()
label_list = LabelList()
for datarow in queued_data_rows:
annotations_list = []
## replace this with your own function
ground_truth_label = get_ground_truth_function(datarow)
for annotation in ground_truth_label:
# Specify annotation class name. This should be exact match of a feature name in ontology
class_name = annotation.class_name
bbox = annotation.bbox
# Create an annotation type
annotations_list.append(ObjectAnnotation(
name = class_name,
value = Rectangle.from_xyhw(*bbox),
))
# Create a label type with data type and annotation types
data = ImageData(uid = datarow['id'])
label_list.append(Label(data = data, annotations = annotations_list))
Convert label list to NDJSON for import
To import model assisted Labels in Labelbox, you will need to convert the python types to NDJSON format. The NDJSON format is used as a normalized interface to connect Python SDK or any other external method and Labelbox backend service.
labels_ndjson = list(NDJsonConverter.serialize(label_list))
upload_job = MALPredictionImport.create_from_objects(
client = client,
project_id = project.uid,
name="upload_label_import_job",
predictions=labels_ndjson)
print("Errors:", upload_job.errors)
Option 2: Create Labels with NDJSON
Alternatively, you can create and upload MAL with NDJSON. Here are the NDJSON supported Annotation kinds for the Labels and Model-assisted Labels creation.
Annotation | Image | Video | Text | Audio | Document | Tiled imagery |
---|---|---|---|---|---|---|
Bounding box | ✓ | ✓ | N/A | N/A | ✓ | ✓ |
Polygon | ✓ | - | N/A | N/A | N/A | ✓ |
Point | ✓ | ✓ | N/A | N/A | N/A | ✓ |
Polyline | ✓ | ✓ | N/A | N/A | N/A | ✓ |
Segmentation mask | ✓ | - | N/A | N/A | N/A | ✓ |
Entity | N/A | N/A | ✓ | N/A | coming soon | N/A |
Relationship | - | - | - | N/A | coming soon | - |
Radio | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Checklist | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Free-form text | ✓ | - | ✓ | ✓ | ✓ | ✓ |
Check out this tutorial notebook for an example of video MAL import via NDJSON.
Video
import uuid
from labelbox import Client, MALPredictionImport, OntologyBuilder, Option, Classification
client = Client()
project = client.create_project(name = "video-frame-based-classifications-project")
dataset = client.create_dataset(name = 'video-frame-based-classifications-dataset')
data_row = dataset.create_data_row(row_data = "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ElephantsDream.mp4")
project.datasets.connect(dataset)
ontology_builder = OntologyBuilder(
classifications = [
Classification(class_type = Classification.Type.RADIO, scope = Classification.Scope.INDEX, instructions = "radio_classification", options = [
Option(value = "radio_option_1"), Option(value = "radio_option_2")]
),
Classification(class_type = Classification.Type.CHECKLIST, scope = Classification.Scope.INDEX, instructions = "checklist_classification", options = [
Option("checklist_option_1"), Option("checklist_option_2")]
)]
)
ontology = client.create_ontology("video-frame-based-classification-ontology", ontology_builder.asdict())
project.setup_editor(ontology)
schema_id_lookup = {}
for classification in ontology.classifications():
options = {}
for option in classification.options:
options[option.value] = option.feature_schema_id
schema_id_lookup[classification.instructions] = {'schema_id' : classification.feature_schema_id, 'options' : options}
radio_annotation = {
"schemaId": schema_id_lookup['radio_classification']['schema_id'],
"uuid": str(uuid.uuid4()),
"dataRow": {
"id": data_row.uid
},
"answer": [
{"schemaId": schema_id_lookup['radio_classification']['options']['radio_option_1'], "frames" : [{"start": 7, "end": 13}, { "start": 19,"end": 20}]},
{"schemaId": schema_id_lookup['radio_classification']['options']['radio_option_2'], "frames" : [{"start": 14, "end": 18}]}
]
}
checklist_annotation = {
"schemaId": schema_id_lookup['checklist_classification']['schema_id'],
"uuid": str(uuid.uuid4()),
"dataRow": {
"id": data_row.uid
},
"answer": [
{"schemaId": schema_id_lookup['checklist_classification']['options']['checklist_option_1'], "frames" : [{"start": 7, "end": 13}, { "start": 18,"end": 19}]},
{"schemaId": schema_id_lookup['checklist_classification']['options']['checklist_option_2'], "frames" : [{"start": 1, "end": 18}]}
]
}
annotations = [radio_annotation, checklist_annotation]
job = MALPredictionImport.create_from_objects(
client, project.uid, str(uuid.uuid4()), annotations)
print(job.errors)
from labelbox import Client, LabelingFrontend
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
API_KEY = None
client = Client(api_key=API_KEY)
ontology_builder = OntologyBuilder(
tools=[Tool(tool=Tool.Type.BBOX, name="jellyfish")])
dataset = client.create_dataset(name="video_mal_dataset")
dataset.create_data_row(
row_data=
"https://storage.labelbox.com/cjhfn5y6s0pk507024nz1ocys%2Fb8837f3b-b071-98d9-645e-2e2c0302393b-jellyfish2-100-110.mp4"
)
project.setup_editor(ontology_builder.asdict())
project.datasets.connect(dataset)
ontology = ontology_builder.from_project(project)
# We want all of the feature schemas to be easily accessible by name.
schema_lookup = {tool.name: tool.feature_schema_id for tool in ontology.tools}
print(schema_lookup)
segments = [{
"keyframes": [{
"frame": 1,
"bbox": {
"top": 80,
"left": 80,
"height": 80,
"width": 80
}
}, {
"frame": 20,
"bbox": {
"top": 125,
"left": 125,
"height": 200,
"width": 300
}
}]
}, {
"keyframes": [{
"frame": 27,
"bbox": {
"top": 80,
"left": 50,
"height": 80,
"width": 50
}
}]
}]
def create_video_bbox_ndjson(datarow_id: str, schema_id: str,
segments: Dict[str, Any]) -> Dict[str, Any]:
return {
"uuid": str(uuid.uuid4()),
"schemaId": schema_id,
"dataRow": {
"id": datarow_id
},
"segments": segments
}
uploads = []
for data_row in dataset.data_rows():
uploads.append(
create_video_bbox_ndjson(data_row.uid, schema_lookup['jellyfish'],
segments))
upload_task = project.upload_annotations(name=f"upload-job-{uuid.uuid4()}",
annotations=uploads,
validate=False)
# Wait for upload to finish (Will take up to five minutes)
upload_task.wait_until_done()
# Review the upload status
print(upload_task.errors)
Document
# MAL for bounding boxes in Documents
annotations = []
for row in project.export_queued_data_rows():
print("row: ",row['id'], row['externalId'])
annotations.append({
"uuid": "a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a12",
"name": "box",
"dataRow": {"id": row['id']},
"bbox": {"top": 50.0, "left": 200.7, "height": 150.8, "width": 200.0},
"unit": "POINTS",
"page": 4
})
import_annotations = MALPredictionImport.create_from_objects(client=client, project_id = project.uid, name=f"import {str(uuid.uuid4())}", predictions=annotations)
import_annotations.wait_until_done()