> ## Documentation Index
> Fetch the complete documentation index at: https://docs.labelbox.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Natural language search

> A guide for using natural language search in Catalog.

You can use Labelbox's *natural language* search to surface data rows that match any expression you provide. This natural language search engine gives your team an edge by helping you find high-impact data rows in an ocean of data (e.g., rare data or edge cases).

We recommend using the native natural language search engine within our Catalog product.

<Frame caption="Natural language search for images">
  <img src="https://mintcdn.com/labelbox-1db23ff4/2Od9VBnnAA3wl0Qw/images/docs/9e239fb-Screenshot_2023-03-07_at_15.39.51.jpeg?fit=max&auto=format&n=2Od9VBnnAA3wl0Qw&q=85&s=189bfa49d89223705cda8d0e686c2d66" alt="" width="3416" height="1920" data-path="images/docs/9e239fb-Screenshot_2023-03-07_at_15.39.51.jpeg" />
</Frame>

<Frame caption="Natural language search for text">
  <img src="https://mintcdn.com/labelbox-1db23ff4/0KsmrO7icq8Jx_0f/images/docs/33d4cf8-small-Screenshot_2023-05-16_at_11.47.12.png?fit=max&auto=format&n=0KsmrO7icq8Jx_0f&q=85&s=49e682b6c912540231ed8f44b84e98ef" alt="" width="1769" height="1024" data-path="images/docs/33d4cf8-small-Screenshot_2023-05-16_at_11.47.12.png" />
</Frame>

<Frame caption="Natural language search for documents">
  <img src="https://mintcdn.com/labelbox-1db23ff4/60Lzt2PDK3fJCHcg/images/docs/f2385ed-small-Screenshot_2023-05-16_at_11.56.45.png?fit=max&auto=format&n=60Lzt2PDK3fJCHcg&q=85&s=092ef50b0debb97478222b440e2cd0c3" alt="" width="1777" height="1024" data-path="images/docs/f2385ed-small-Screenshot_2023-05-16_at_11.56.45.png" />
</Frame>

## How natural language search works

Natural language search is powered by *vector embeddings*. A vector embedding is a numerical representation of a piece of data (e.g., an image, text, document, or video) that translates the raw data into a lower-dimensional space.

Recent advances in the machine learning field enable some neural networks (e.g., [CLIP vision model by OpenAI](https://openai.com/research/clip) or [all-mpnet-base-v2 text model](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)) to recognize a wide variety of visual concepts in images, texts, or documents and associate them with keywords.

You can now surface images in Catalog by describing them in natural language. For example, type in *"a photo of birds in the sunset"* to surface images of birds in the sunset. You can also surface text, conversational text, or documents by describing them in natural language. For example, type in *"disappointed movie reviews"* to surface data rows that likely contain negative movie reviews.

<Info>
  ### Character/word limit for natural language search

  To view the character or word limit for the natural language filter, visit our [limits](/docs/limits) page.
</Info>

## Supported media types

Labelbox supports natural language search for several data modalities. For each media type, the same neural network embeds both the user query and the data row. Below is the list of neural networks used for each media type.

| Asset type              | Supported             | Embedding                                                                                                                                                                                                                                |
| ----------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Image**               | <Icon icon="check" /> | [CLIP-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) (512 dimensions)                                                                                                                                             |
| **Video**               | <Icon icon="check" /> | Google [Gemini Pro Vision](https://ai.google.dev/models/gemini). First two (2) minutes of content is embedded. Audio signal is not used currently. This is a paid add-on feature available upon request.                                 |
| **Text**                | <Icon icon="check" /> | [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) (768 dimensions), based on the first 64K characters                                                                                                  |
| **HTML**                | <Icon icon="check" /> | [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) (768 dimensions), based on the first 64K characters                                                                                                  |
| **Document**            | <Icon icon="check" /> | [CLIP-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) (512 dimensions) and [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) (768 dimensions), based on the first 64K characters |
| **Tiled imagery**       | <Icon icon="check" /> | [CLIP-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) (512 dimensions)                                                                                                                                             |
| **Audio**               | <Icon icon="check" /> | Audio is transcribed to text. [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) (768 dimensions)                                                                                                       |
| **Conversational text** | <Icon icon="check" /> | [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) (768 dimensions), based on first 64K characters                                                                                                      |

## How to search data using natural language

In the gallery view of Catalog, select the **Natural language** filter. Then, decide if you want to do a **Visual** search or **Text** search. Finally, input the description of the data you are looking for. The description must have at least 3 characters and at most 10 words.

<Frame caption="Natural language search for images">
  <img src="https://mintcdn.com/labelbox-1db23ff4/0KsmrO7icq8Jx_0f/images/docs/2b0d7a0-small-Screenshot_2023-05-16_at_12.20.50.png?fit=max&auto=format&n=0KsmrO7icq8Jx_0f&q=85&s=6db0677ee47a1f8486e3f24ec1480b5e" alt="" width="1771" height="1024" data-path="images/docs/2b0d7a0-small-Screenshot_2023-05-16_at_12.20.50.png" />
</Frame>

### Prompt engineering

Prompt engineering involves trying several prompts until finding one that works well. Labelbox recommends trying several natural language descriptions (or *prompts*) until the natural language search surfaces the data you are looking for. Users have reported that small tweaks to the prompt can help return more relevant data.

<Tip>
  **Pro tip: Refine your prompt with positive biases and negative biases.**
</Tip>

You can refine your prompt by adding positive biases and negative biases, using this prompt structure: `[my prompt] / [more of this positive bias] / [less of this negative bias]`.

For example, I can refine my search for `a photo of birds in the sunset`, by asking for `purple` sunsets and not `red` sunsets, with the prompt: `a photo of birds in the sunset / purple / red`. This returns only images of birds in a purple sunset, and not in a red sunset.

<Frame caption="Advanced prompt to keep only purple sunsets and remove red sunsets: a photo of birds in the sunset / purple / red">
  <img src="https://mintcdn.com/labelbox-1db23ff4/DfR8IhNEVbXdqxoB/images/docs/112065c-small-Screenshot_2023-05-16_at_12.28.47.png?fit=max&auto=format&n=DfR8IhNEVbXdqxoB&q=85&s=ca5e8eafacd829b8f3d431cd16b61ddf" alt="" width="1769" height="1024" data-path="images/docs/112065c-small-Screenshot_2023-05-16_at_12.28.47.png" />
</Frame>

### Set the score range

Natural language search surfaces the data rows whose embeddings are closest to the prompt. This is measured using cosine distance, a number between 0 and 1. The more similar the embeddings, the higher the natural language score.

By default, Labelbox returns embeddings with a natural language score between 0.5 and 1. You can customize this range by setting the minimum and maximum values of the natural language search slider.

<Frame caption="Customize the results of the natural language search by specifying the range of scores.">
  <img src="https://mintcdn.com/labelbox-1db23ff4/Y2wXEmSLwSvn6HCD/images/docs/409bd79-Screenshot_2023-03-07_at_15.56.29.png?fit=max&auto=format&n=Y2wXEmSLwSvn6HCD&q=85&s=60500f042468c92268958010867b4fc7" alt="" width="1668" height="338" data-path="images/docs/409bd79-Screenshot_2023-03-07_at_15.56.29.png" />
</Frame>

### Combine natural language search & other searches

You can combine natural language search with other filters in Catalog. Some filters are best used for targeting *unstructured* data, and others for targeting *structured* data.

Combine natural language search with the following filters to target data rows by structured data:

<CardGroup>
  <Card title="Metadata" icon="square-1" horizontal href="/reference/metadata" />

  <Card title="Annotations" icon="square-2" horizontal href="/docs/annotate-overview" />

  <Card title="Datasets" icon="square-3" horizontal href="/reference/dataset" />

  <Card title="Projects" icon="square-4" horizontal href="/reference/project" />
</CardGroup>

Combine natural language search with the following filters to search unstructured data:

<CardGroup>
  <Card title="Similarity search" icon="square-1" horizontal href="/docs/similarity" />

  <Card title="Text search" icon="square-2" horizontal href="/docs/find-text" />
</CardGroup>

<Frame caption="Use natural language search with other filters to surface high-impact data.">
  <img src="https://mintcdn.com/labelbox-1db23ff4/cAZSqWb47Qd3ouPH/images/docs/888aa1a-Screenshot_2023-03-07_at_16.00.25.jpeg?fit=max&auto=format&n=cAZSqWb47Qd3ouPH&q=85&s=95e466677c35f13ddea11e77d539ba3a" alt="" width="2620" height="1952" data-path="images/docs/888aa1a-Screenshot_2023-03-07_at_16.00.25.jpeg" />
</Frame>

## Automate data curation with slices

After populating filters in Catalog, you can save these filters as a [slice](/docs/slices) of data. When you save a filter as a slice, you will not need to populate the same filters repeatedly. Also, slices are dynamic, so any new incoming data row in Catalog will appear in the relevant slices.

Read the following resources to learn how to take action on the filtered data.

<CardGroup>
  <Card title="Refine the similarity search" icon="square-1" horizontal href="/docs/similarity#adding-anchors" />

  <Card title="Send filtered data rows to a labeling project as a batch" icon="square-2" horizontal href="/docs/batches" />

  <Card title="Add metadata to the filtered data rows" icon="square-3" horizontal href="/docs/datarow-metadata" />
</CardGroup>
