A developer guide for uploading custom embeddings on any kind of data.
How to upload custom embeddings
You can improve your data exploration and similarity search experience by adding your own custom embeddings. Labelbox allows you to upload up to 100 different custom embeddings on any kind of data. You can experiment with different embeddings to power your data selection.
Python SDK support coming soon
In Q4 2023, you will be able to upload custom embeddings via the Python SDK. Meanwhile, here is a temporary solution to upload custom embeddings to Labelbox.
Step 1: Install the package
This Github package is built and maintained by Labelbox. ADVLib is a basic library and command line tool for importing custom embeddings into Labelbox. Before you can upload custom embeddings, you'll need to install this package.
pip3 install -q 'git+https://github.com/Labelbox/advlib.git'
Step 2: Set up the API key
In order to upload custom embeddings, you must have a Labelbox API key stored in the environment in one of two ways
LABELBOX_API_KEY
- The API key itselfLABELBOX_API_KEY_FILE
- The path to a file containing the Labelbox API key.
Step 3: Create a custom embedding type
Minumum 1000 custom embedding vectors
You must upload at least 1000 feature vectors for similarity search to function in Catalog.
Create a custom embedding type
Use this command to create a custom embedding type:
advtool embeddings create <NAME> <N DIMENSIONS>
Field | Definition |
---|---|
<NAME> | This is the name of your custom embedding type. It can be any string. |
<N DIMENSIONS> | This indicates the dimensionality of your custom embedding type. It must be an integer between 8 and 2048. |
This will output the ID of the newly created custom embedding type.
List existing custom embedding types
After you create your custom embedding type, use this command to check whether it exists.
advtool embeddings list
Create the payload for custom embeddings
The payload should be a .ndjson file. It should have the following format. Every line corresponds to a specific custom embedding vector on a specific data row.
{"id": <DATA ROW ID>, "vector": [some floats]}
Field | Description |
---|---|
<DATA ROW ID> | ID of the data row. |
[some floats] | The custom embedding vector. It must have the number of dimensions specified in the custom embedding type (between 8 and 2048). |
Here is an example .ndjson file.
{"id": "clabk7ly90gmg076ag72l44c9", "vector": [2.58, -7.05, -4.01, -20.93, 11.36, -13.46, -0.055, 13.8]},
{"id": "clabk7lzg0ifs07b50zqs0btq", "vector": [0.05, 16.29, -16.11, -8.05, -2.67, -11.53, -4.52, -0.60]},
Upload the payload to Labelbox
advtool embeddings import <EMB ID> <NDJSON FILE>
Field | Description |
---|---|
<EMB ID> | Embedding ID |
<NDJSON FILE> | The .ndjson file that contains the payload |
Count the number of vectors uploaded
You can get a count of the number of vectors uploaded for a specific custom embedding. <EMB ID>
is the embedding ID.
advtool embeddings count <EMB ID>
Field | Description |
---|---|
<EMB ID> | Embedding ID |
Delete a custom embedding type
You can delete a custom embedding type. <EMB ID>
is the embedding ID.
advtool embeddings delete <EMB ID>
Steps 1-3: End-to-end Python tutorial
Check out this end-to-end Python tutorial to see how to upload custom embeddings to Labelbox (Steps 1-7).