Custom model Integration

If you are on an enterprise plan, you can integrate custom models with Foundry to use them to predict labels, enrich data, and generate responses for evaluation purposes. To upgrade to the enterprise plan, please contact sales.

Host custom models

Before integrating your custom model, you need to deploy it on an HTTP endpoint accessible via the Internet that accepts HTTP POST calls with a JSON payload. You can host it either on your own infrastructure or through any model hosting vendor, such as Vertex AI, Databricks, Huggingface, Replicate, OpenAI.

Create model integrations

Once you have a public HTTP endpoint for your custom model, you can create the integration:

On the Models page, click Create and select Custom Model.
Select the data type for the model.
Add custom model information, including:

Name: A unique identifier for the model.
HTTP endpoint: The URL of the HTTP endpoint hosting your model.
Secret (optional): The authentication token for secret-secured endpoints only.
Description (optional): The descriptive context of the model.

Click Create model.

On the Settings tab, you can review and edit the model information. You can add a rate limit and a Readme. To send data to your model for label prediction, click + Model run. From there, you can define and preview your model run, view prediction and details, and send predictions to Annotate.

🚧
Bounding box and mask tasks not supported
Currently, this model integration flow doesn't support tasks involving bounding box and mask annotations. To integrate a custom model for these tasks, see Create model integrations for bounding box and mask tasks.

Create model integrations for bounding box and mask tasks

For a custom model predicting bounding box and mask labels, you need to create a model manifest file and contact customer solutions to manually establish the integration. The Labelbox solutions team can help you manage the job queuing, track status, and process predictions using the Labelbox platform.

Create manifest files

To integrate your model into the Foundry workflow, you need to specify and provide a model.yaml manifest file. This file stores metadata about the model, including its name, description, inference parameters, model output ontology, API endpoint, and other details. You need to create the model.yaml file in the following format:

name: My custom model 
inference_endpoint: my_inference_endpoint # Deploy your service to an API endpoint that can be accessed 
secrets: my_secret # Your secret, API keys to be authenticated with your endpoint 
requests_per_second: 0.1 # Your estimate of requests per second  
description: My awesome custom model for object recognition 
readme: | # optional readme in markdown format
  ### Intended Use
  Object recognition model on my custom classes.
  ### Limitations
  My custom model has limitations, such as ...
  ### Citation
  ... 

allowed_asset_types: [image] # list of allowed asset types, one or more of [image, "text", "video", "html", "conversational"]
allowed_feature_kinds: [text, radio, checklist] # list of allowed feature kinds. One or more of [text, radio, checklist, rectangle, raster-segmentation, named-entity, polygon, point, edge]

# Only needed if your model has a predefined set of classes for classification or object detection. If your model is an LLM or takes any text input, you can remove this section.
ontology:
  media_type: IMAGE # This example ontology has two classification classes and two object detection classes. 
  classifications: 
    - instructions: label
        name: label
        type: radio
        options:
        - label: tench
          value: tench
          position: 0
        - label: goldfish
          value: goldfish
          position: 1
  tools:
    - name: person
      tool: rectangle
    - name: bicycle
      tool: rectangle

inference_params_json_schema: # hyperparmeters configured in the app and passed to your API endpoint. 
  properties: # Examples follow, each with different types and defaults.
    prompt: 
      description: "Prompt to use for text generation"
      type: string
      default: ""
    confidence:
      description: object confidence threshold for detection
      type: number
      default: 0.25
      minimum: 0.0
      maximum: 1.0
    max_new_tokens:
      description: Maximum number of tokens to generate. Each word is generally 2-3 tokens.
      type: integer
      default: 1024
      minimum: 100
      maximum: 4096
    use_image_attachments:
      description: Set to true if model should also process datarow attachments.
      type: boolean
      default: False
  required: # Use to specify hyperparameters that must have values for each model run.
    - prompt

max_tokens: 1024 # only relevant for LLM to control maximum token size

Endpoint requests for model tasks

Every time you use your integrated custom model to predict labels or run other tasks, it sends a JSON request to your model's endpoint. The request payload provides the data row for prediction and includes the ontology and inference parameter values you selected. Here's an example request body:

{
 "prompt": [
   {
     "role": "system",
     "parts": [
       {
         "text": "Start each sentence with three equal signs ==="
       }
     ]
   },
   {
     "role": "user",
     "parts": [
       {
         "text": "what is in this text and image?"
       },
       {
         "text": "Hello. This is a user-provided txt file content."
       },
       {
         "image": "base64_encoded_image_string"
       }
     ]
   }
 ]
}

Here're descriptions of fields in the request body:

prompt: Contains the current conversation with the model. For single-turn queries, it’s a single instance. For multi-turn queries, it includes conversation history and the latest request. Each prompt has a message structure with two properties: role and parts.
role: A string indicating the individual producing the message content. Possible values include:
- system: Instructions to the model.
- user: User-generated message sent by a real person.
- assistant: Model-generated message, used to insert responses from the model during multi-turn conversations.
parts: A list of ordered parts that make up a multi-part message content. It can contain the following segments of data:
- text: Text prompt or code snippet.
- image: Base64 encoded image.

Response

Responses are expected to match the format of the labels predicted by the custom model, such as a string containing the raw model response or a JSON object for NER and classifications. Here's an example JSON response with keys corresponding to feature names in the ontology:

// Object Detection
{
  "cat": {
	// coordinate order: left, top, width, height
      "boxes": [[0, 0, 10, 10], [40, 40, 8, 10]], 
      "scores": [0.9, 0.7],
  },
  "dog": {
      "boxes": [[20, 20, 5, 5]],
      "scores": [0.8],
  },
}
// Classification
{
  "summary": "Tom and Bob are happy to work at IBM", // Free Text
  "sentiment": "positive",  // Radio classification
  "emotion": ["joy", "fear"], // Checklist classification
}
// Segmentation
{
  "cat": {
	// Can use pycocotools.mask.encode for RLE encoding
	"masks": [
		{
			"size": [<height>, <width>],
			"counts": "<run-length-encoded-boolean-mask>"
}
]
  }
}
// Named Entity
{
  "person": [
    {"start": 0, "end": 3, "text": "Tom"},
    {"start": 5, "end": 8, "text": "Bob"},
  ]
}

Host custom models

Create model integrations

🚧Bounding box and mask tasks not supported

Create model integrations for bounding box and mask tasks

Create manifest files

Endpoint requests for model tasks

Response

🚧
Bounding box and mask tasks not supported