Custom embeddings

A developer guide for uploading custom embeddings on any kind of data.

How to upload custom embeddings

You can improve your data exploration and similarity search experience by adding your own custom embeddings. Labelbox allows you to upload up to 100 different custom embeddings on any kind of data. You can experiment with different embeddings to power your data selection.

Step 1: Install the package

This Github package is built and maintained by Labelbox. ADVLib is a basic library and command line tool for importing custom embeddings into Labelbox. Before you can upload custom embeddings, you'll need to install this package.

pip3 install -q 'git+https://github.com/Labelbox/advlib.git'

Step 2: Set up the API key

In order to upload custom embeddings, you must have a Labelbox API key stored in the environment in one of two ways

  • LABELBOX_API_KEY - The API key itself
  • LABELBOX_API_KEY_FILE - The path to a file containing the Labelbox API key.

Step 3: Create a custom embedding type

πŸ“˜

Minumum 1000 custom embedding vectors

You must upload at least 1000 feature vectors for similarity search to function in Catalog.

Create a custom embedding type

Use this command to create a custom embedding type:

advtool embeddings create <NAME> <N DIMENSIONS>
FieldDefinition
<NAME>This is the name of your custom embedding type. It can be any string.
<N DIMENSIONS>This indicates the dimensionality of your custom embedding type. It must be an integer between 8 and 2048.

This will output the ID of the newly created custom embedding type.

List existing custom embedding types

After you create your custom embedding type, use this command to check whether it exists.

advtool embeddings list

Create the payload for custom embeddings

The payload should be a .ndjson file. It should have the following format. Every line corresponds to a specific custom embedding vector on a specific data row.

{"id": <DATA ROW ID>, "vector": [some floats]}
FieldDescription
<DATA ROW ID>ID of the data row.
[some floats]The custom embedding vector. It must have the number of dimensions specified in the custom embedding type (between 8 and 2048).

Here is an example .ndjson file.

{"id": "clabk7ly90gmg076ag72l44c9", "vector": [2.58, -7.05, -4.01, -20.93, 11.36, -13.46, -0.055, 13.8]},
{"id": "clabk7lzg0ifs07b50zqs0btq", "vector": [0.05, 16.29, -16.11, -8.05, -2.67, -11.53, -4.52, -0.60]},

Upload the payload to Labelbox

advtool embeddings import <EMB ID> <NDJSON FILE>
FieldDescription
<EMB ID>Embedding ID
<NDJSON FILE>The .ndjson file that contains the payload

Count the number of vectors uploaded

You can get a count of the number of vectors uploaded for a specific custom embedding. <EMB ID> is the embedding ID.

advtool embeddings count <EMB ID>
FieldDescription
<EMB ID>Embedding ID

Delete a custom embedding type

You can delete a custom embedding type. <EMB ID> is the embedding ID.

advtool embeddings delete <EMB ID>

Steps 1-3: End-to-end Python tutorial

Check out this end-to-end Python tutorial to see how to upload custom embeddings to Labelbox (Steps 1-7).