> ## Documentation Index > Fetch the complete documentation index at: https://docs.labelbox.com/llms.txt > Use this file to discover all available pages before exploring further. # Custom embeddings > Shows how to upload custom embeddings to improve similarity search. ## How to upload custom embeddings Custom embeddings improve data exploration by improving [similarity search](/docs/similarity). You can upload up to ten (10) custom embedding types per workspace on any data type. Use this to experiment with different embeddings to improve data selection. ## Before you start This example requires the following libraries: ```python Install theme={null} # Starting from SDK version 3.69, custom embeddings are now supported. import labelbox as lb import numpy as np import json import uuid import random ``` ## Replace API key ```python Python theme={null} API_KEY = "" client = lb.Client(API_KEY) ``` ## Select data rows First, we need to fetch data rows from a Labelbox dataset. To improve similarity search, you need to upload custom embeddings to at least 1,000 data rows. ```python Python theme={null} dataset = client.get_dataset("") export_task = dataset.export() export_task.wait_till_done() data_rows = [] # Stream the export using a callback function def json_stream_handler(output: labelbox.BufferedJsonConverterOutput): print(output.json) export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler) # Collect all exported data into a list export_json = [data_row.json for data_row in export_task.get_buffered_stream()] ``` Extract the data row ID and the row data (asset URL): ```python Python theme={null} data_row_dict = [{"data_row_id": dr["data_row"]["id"]} for dr in data_rows] data_row_dict = data_row_dict[:1000] # keep the first 1000 examples for the sake of this demo ``` ## Create custom embedding payload To prepare the data: Generate random vectors for embeddings (max: `2048` dimensions) ```python Python theme={null} nb_data_rows = len(data_row_dict) print("Number of data rows: ", nb_data_rows) # Labelbox supports custom embedding vectors of up to 2048 dimensions custom_embeddings = [list(np.random.random(2048)) for _ in range(nb_data_rows)] ``` List custom embeddings in your Labelbox workspace: ```python Python theme={null} embeddings = client.get_embeddings() ``` Choose an existing embedding type or create a new one A unique custom embedding name is required as an argument for this method. ```python Python theme={null} # Name of the custom embedding must be unique embedding = client.create_embedding("my_custom_embedding_2048_dimensions", 2048) ``` Create payload * The payload should encompass the `key` (data row id or global key) and the new embedding vector data. Note that the `dataset.upsert_data_rows()` operation will only update the values you pass in the payload; all other existing row data will not be modified. ```python Python theme={null} payload = [] for data_row_dict, custom_embedding in zip(data_row_dict,custom_embeddings): payload.append({"key": lb.UniqueId(data_row_dict['data_row_id']), "embeddings": [{"embedding_id": embedding.id, "vector": custom_embedding}]}) print('payload', len(payload),payload[:1]) ``` ## Upload payload Upsert data rows with custom embeddings ```python Python theme={null} task = dataset.upsert_data_rows(payload) task.wait_till_done() print(task.errors) print(task.status) ``` Get the count of imported vectors for a custom embedding type An updated count can take a few minutes, depending on the number of data rows associated with the embedding type. ```python Python theme={null} count = embedding.get_imported_vector_count() ``` Delete custom embedding type. ```python Python theme={null} embedding.delete() ``` ## Upload custom embeddings during data row creation Create a dataset ```python Python theme={null} # Create a dataset dataset_new = client.create_dataset(name="data_rows_with_embeddings") ``` Fetch an embedding type and create dummy vector data. ```python Python theme={null} embedding = client.get_embedding_by_name("my_custom_embedding_2048_dimensions") vector = [random.uniform(1.0, 2.0) for _ in range(embedding.dims)] ``` Upload data rows with embeddings. ```python Python theme={null} uploads = [] # Generate data rows for i in range(1,9): uploads.append({ "row_data": f"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_000{i}.jpeg", "global_key": "TEST-ID-%id" % uuid.uuid1(), "embeddings": [{ "embedding_id": embedding.id, "vector": vector }] }) task1 = dataset_new.create_data_rows(uploads) task1.wait_till_done() print("ERRORS: " , task1.errors) print("RESULTS:" , task1.result) ```