A developer guide for creating and managing model training experiments.
A model run is a model training experiment within a Model directory. Each model run has its data snapshot (data rows, annotations, and data splits) versioned. You can upload predictions to a model run, and compare its performance against other Model Runs in the Model directory.
Get all model runs inside a Model
model_runs = model.model_runs()
Create a model run
Creates a model run belonging to this model.
model_run_name = "<your_model_run_name>"
example_config = {
"learning_rate": 0.001,
"batch_size": 32,
}
model_run = model.create_model_run(name=model_run_name, config=example_config)
Get model run
model_run_id = "<your_model_run_id>"
model_run = client.get_model_run(model_run_id=model_run_id)
model_run_data = model_run.model_run_data_rows()
model_run_config = model_run.get_config())
Add data rows to a model run
Add data rows to a model run without any associated labels. You can use either data_row_id
or global_key
to specify the data rows.
# Turn on the experimental mode of the SDK
client.enable_experimental=True
# Using data row ids
dataset = client.get_dataset("<Dataset_id>")
data_row_ids = [data_row.uid for data_row in dataset.export_data_rows()]
model_run.upsert_data_rows(data_row_ids=data_row_ids)
# Using global keys
global_keys = ["<global_key1>", "<global_key_2>"]
model_run.upsert_data_rows(global_keys=global_keys)
Assign data row training, validation, and test split
Note that assign_data_rows_to_split
only works on data rows or labels that are already in a model run. You can assign them to one of "TRAINING", "VALIDATION", "TEST" split.
client.enable_experimental=True
dataset = client.get_dataset("<Dataset_id>") # Your training dataset
# using data row ids
model_run.assign_data_rows_to_split(
data_row_ids=data_row_ids[:100],
split="TRAINING",
)
model_run.assign_data_rows_to_split(
data_row_ids=data_row_ids[100:150],
split="VALIDATION",
)
model_run.assign_data_rows_to_split(
data_row_ids=data_row_ids[150:200],
split="TEST",
)
# using global keys
model_run.assign_data_rows_to_split(
global_keys=global_keys[:100],
split="TRAINING",
)
model_run.assign_data_rows_to_split(
global_keys=global_keys[100:150],
split="VALIDATION",
)
model_run.assign_data_rows_to_split(
global_keys=global_keys[150:200],
split="TEST",
)
Upload custom metrics
If the auto-generated metrics are not sufficient for your use case, you can upload custom metrics to your model run. This will help you even more precisely evaluate your model performance in Labelbox.
Scalar custom metrics
A ScalarMetric
is a custom metric with a single scalar value. It can be uploaded at the following levels of granularity:
1. Data rows
2. Features
3. Nested features
from labelbox.data.annotation_types import (ScalarMetric,
ScalarMetricAggregation,
ConfusionMatrixMetric)
# custom metric on a data row
data_row_metric = ScalarMetric(metric_name="iou", value=0.5)
# custom metric on a feature
feature_metric = ScalarMetric(metric_name="iou", feature_name="cat", value=0.5)
# custom metric on a nested feature
subclass_metric = ScalarMetric(metric_name="iou",
feature_name="cat",
subclass_name="orange",
value=0.5)
Aggregation of custom metrics
This is an optional field on the ScalarMetric
object, to control how custom metrics are aggergated. By default, the aggregation uses ARITHMETIC_MEAN
.
Aggregations occur in the following cases:
- When you provide a feature or nested-feature metric, Labelbox automatically aggregates the metric across features and nested-features on the data row.
For example, say you provide a custom metric Bounding Box Width (BBW) on the features "cat" and "dog" . The data row-level metric for BBW is the average of these two values. - When you create slices, the custom metric is aggregated across data rows of the Slice.
- When you filter data inside a Model Run, the custom metric is aggregated across the filtered data rows.
"""
If the following metrics are uploaded then
in the Labelbox App, users will see:
true positives dog = 4
true positives cat = 3
true positives = 7
"""
feature_metric = ScalarMetric(metric_name="true_positives",
feature_name="cat",
value=3,
aggregation=ScalarMetricAggregation.SUM)
feature_metric = ScalarMetric(metric_name="true_positives",
feature_name="dog",
value=4,
aggregation=ScalarMetricAggregation.SUM)
Add labels to a model run
Adds data rows and labels to a model run. By adding labels, the associated data rows will also be upserted to the model run.
# upsert using label ids
label_ids = ["<label_id_1>","<label_id_2>", ...]
model_run.upsert_labels(label_ids)
Alternatively, you can add all labels from a project to a Model run directly. This will also add all data rows from that project to the model run.
# upsert using project id
model_run.upsert_labels(project_id=<project_id>)
Export labels from a Model Run
Export v2 (beta). See Export v2 for Model Runs (beta) for more details and export v2 JSON format.
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
}
export_task = model_run.export_v2(params=export_params)
export_task.wait_till_done()
print(export_task.errors)
export_json = export_task.result
Export v1
# Turn on the experimental mode of the SDK
client.enable_experimental=True
# If download=False, this returns the URLs of the data files associated with this ModelRun’s labels.
download = False
model_run.export_labels(download=download)
# If download=True, this instead returns the contents as NDJSON format.
download = True
model_run.export_labels(download=download)
Create, modify, and delete model run config to track your hyperparameters.
example_config = {
"learning_rate": 0.001,
"checkpoint_path": "/path/to/checkpoint/file",
"early_stopping": False,
"batch_size": 32,
"optimizer": {
"adam": {
"beta1": 0.899999976158,
"beta2": 0.999000012875,
"epsilon": 9.99999993923e-9
}
},
"ngpu": 1,
}
model_run_1 = model.create_model_run(name="run 1", config=example_config)
# You can also create a model with config specified, see above.
# Here is how to create a model run first and update the model config field.
model_run_2 = model.create_model_run(name="run 2")
#The update will repace the previous model run config with the new json input.
model_run_2.update_config(example_config)
Get model run config
model_run_parameters = model_run.get_config()
Delete the model run config
model_run.reset_config()
Delete data rows from a model run
data_row_ids = ["<data_row_id_1>","<data_row_id_2>", ...]
model_run.delete_model_run_data_rows(data_row_ids=data_row_ids)
Delete model run
model_run.delete()