Custom embeddings

Open in Colab

GitHub

How to upload custom embeddings

Custom embeddings improve data exploration by improving similarity search. You can upload up to ten (10) custom embedding types per workspace on any data type. Use this to experiment with different embeddings to improve data selection.

Before you start

This example requires the following libraries:

# Starting from SDK version 3.69, custom embeddings are now supported.
import labelbox as lb
import numpy as np
import json
import uuid
import random

Replace API key

API_KEY = ""
client = lb.Client(API_KEY)

Select data rows

First, we need to fetch data rows from a Labelbox dataset. To improve similarity search, you need to upload custom embeddings to at least 1,000 data rows.

dataset = client.get_dataset("<DATASET-ID>")

export_task = dataset.export()
export_task.wait_till_done()

data_rows = []

# Stream the export using a callback function

def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list

export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

Extract the data row ID and the row data (asset URL):

data_row_dict = [{"data_row_id": dr["data_row"]["id"]} for dr in data_rows]
data_row_dict = data_row_dict[:1000] # keep the first 1000 examples for the sake of this demo

Create custom embedding payload

To prepare the data:

Generate random vectors for embeddings (max: 2048 dimensions)

nb_data_rows = len(data_row_dict)
print("Number of data rows: ", nb_data_rows)
# Labelbox supports custom embedding vectors of up to 2048 dimensions
custom_embeddings = [list(np.random.random(2048)) for _ in range(nb_data_rows)]

List custom embeddings in your Labelbox workspace:

embeddings = client.get_embeddings()

Choose an existing embedding type or create a new one A unique custom embedding name is required as an argument for this method.

# Name of the custom embedding must be unique
embedding = client.create_embedding("my_custom_embedding_2048_dimensions", 2048)

Create payload

The payload should encompass the key (data row id or global key) and the new embedding vector data. Note that the dataset.upsert_data_rows() operation will only update the values you pass in the payload; all other existing row data will not be modified.

payload = []
for data_row_dict, custom_embedding in zip(data_row_dict,custom_embeddings):
  payload.append({"key": lb.UniqueId(data_row_dict['data_row_id']),
                  "embeddings": [{"embedding_id": embedding.id, "vector": custom_embedding}]})

  print('payload', len(payload),payload[:1])

Upload payload

Upsert data rows with custom embeddings

task = dataset.upsert_data_rows(payload)
task.wait_till_done()
print(task.errors)
print(task.status)

Get the count of imported vectors for a custom embedding typeAn updated count can take a few minutes, depending on the number of data rows associated with the embedding type.

count = embedding.get_imported_vector_count()

Delete custom embedding type.

 embedding.delete()

Upload custom embeddings during data row creation

Create a dataset

# Create a dataset
dataset_new = client.create_dataset(name="data_rows_with_embeddings")

Fetch an embedding type and create dummy vector data.

embedding = client.get_embedding_by_name("my_custom_embedding_2048_dimensions")
vector = [random.uniform(1.0, 2.0) for _ in range(embedding.dims)]

Upload data rows with embeddings.

uploads = []
# Generate data rows
for i in range(1,9):
    uploads.append({
        "row_data":  f"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_000{i}.jpeg",
        "global_key": "TEST-ID-%id" % uuid.uuid1(),
        "embeddings": [{
                    "embedding_id": embedding.id,
                    "vector": vector
                }]
    })

task1 = dataset_new.create_data_rows(uploads)
task1.wait_till_done()
print("ERRORS: " , task1.errors)
print("RESULTS:" , task1.result)

Getting Started

Annotate

Model

Catalog

Schema

Import/Export

Integrations

Manage Team

Examples

Custom embeddings

Open in Colab

GitHub

How to upload custom embeddings

Before you start

Replace API key

Select data rows

Create custom embedding payload

Upload payload

Upload custom embeddings during data row creation

Getting Started

Annotate

Model

Catalog

Schema

Import/Export

Integrations

Manage Team

Examples

Open in Colab

GitHub

​How to upload custom embeddings

​Before you start

​Replace API key

​Select data rows

​Create custom embedding payload

​Upload payload

​Upload custom embeddings during data row creation

How to upload custom embeddings

Before you start

Replace API key

Select data rows

Create custom embedding payload

Upload payload

Upload custom embeddings during data row creation