Custom model Integration

Describes how to set up and integrate a custom model so that it can be used with Foundry.

Enterprise customers can integrate custom models with Foundry to use them to predict labels, enrich data or to generate response(s) for evaluation purposes.

To begin, host your custom model on your own infrastructure or your preferred model hosting vendor.
Endpoints can be hosted by any appropriate service, including Vertex AI, Databricks, Huggingface, Replicate, OpenAI, and more.

The custom model should be deployed to an HTTP endpoint available on the Internet.

Once the endpoint is known, create a model manifest file and then contact customer solutions.
The Labelbox solutions team will help you manage the job queuing, track status, and process predictions using the Labelbox platform.

The model manifest file

To integrate your model into the Foundry workflow, you need to specify and provide a model.yaml manifest file.

The model.yaml manifest file stores metadata about the model; this includes the name and description of your model, inference parameters, model output ontology, API endpoint, and other details. This information is required in order to integrate your custom model into the Foundry service.

Here is an example model.yaml file:

name: My custom model 
inference_endpoint: my_inference_endpoint # Deploy your service to an API endpoint that can be accessed 
secrets: my_secret # Your secret, API keys to be authenticated with your endpoint 
requests_per_second: 0.1 # Your estimate of requests per second  
description: My awesome custom model for object recognition 
readme: | # optional readme in markdown format
  ### Intended Use
  Object recognition model on my custom classes.
  ### Limitations
  My custom model has limitations, such as ...
  ### Citation

allowed_asset_types: [image] # list of allowed asset types, one or more of [image, "text", "video", "html", "conversational"]
allowed_feature_kinds: [text, radio, checklist] # list of allowed feature kinds. One or more of [text, radio, checklist, rectangle, raster-segmentation, named-entity, polygon, point, edge]

# Only needed if your model has a predefined set of classes for classification or object detection. If your model is an LLM or takes any text input, you can remove this section.
  media_type: IMAGE # This example ontology has two classification classes and two object detection classes. 
    - instructions: label
        name: label
        type: radio
        - label: tench
          value: tench
          position: 0
        - label: goldfish
          value: goldfish
          position: 1
    - name: person
      tool: rectangle
    - name: bicycle
      tool: rectangle

inference_params_json_schema: # hyperparmeters configured in the app and passed to your API endpoint. 
  properties: # Examples follow, each with different types and defaults.
      description: "Prompt to use for text generation"
      type: string
      default: ""
      description: object confidence threshold for detection
      type: number
      default: 0.25
      minimum: 0.0
      maximum: 1.0
      description: Maximum number of tokens to generate. Each word is generally 2-3 tokens.
      type: integer
      default: 1024
      minimum: 100
      maximum: 4096
      description: Set to true if model should also process datarow attachments.
      type: boolean
      default: False
  required: # Use to specify hyperparameters that must have values for each model run.
    - prompt

max_tokens: 1024 # only relevant for LLM to control maximum token size

API Endpoint

Your endpoint should accept HTTP POST calls with a JSON payload and be available to the Internet. If secured via authentication token, you'll have to provide the token to the Labelbox team.

Request Payload

The request payload provides the data row for the prediction. It includes the ontology and inference parameter values selected by the user.

  "data_row": {
    "row_data": "https://path/to/datarow.png",
    "global_key": "<global-key-for-data-row>",
    "id": "<data_row_id>"
  "ontology": "<ontology>",
  "inference_params": "<inference-params>"


The JSON response should be similar to this example:

// Object Detection
  "cat": {
	// coordinate order: left, top, width, height
      "boxes": [[0, 0, 10, 10], [40, 40, 8, 10]], 
      "scores": [0.9, 0.7],
  "dog": {
      "boxes": [[20, 20, 5, 5]],
      "scores": [0.8],
// Classification
  "summary": "Tom and Bob are happy to work at IBM", // Free Text
  "sentiment": "positive",  // Radio classification
  "emotion": ["joy", "fear"], // Checklist classification
// Segmentation
  "cat": {
	// Can use pycocotools.mask.encode for RLE encoding
	"masks": [
			"size": [<height>, <width>],
			"counts": "<run-length-encoded-boolean-mask>"
// Named Entity
  "person": [
    {"start": 0, "end": 3, "text": "Tom"},
    {"start": 5, "end": 8, "text": "Bob"},

The keys are the feature names in the ontology.

If your model endpoint has a different output format, work with our solution team for help with the post-processing process.

For complex ontologies, you can use the export annotation format. The format can vary according to the data type. For details, review our ontology example page.

Multimodal chat evaluation inferencing specifics

Given the Multimodel chat evaluation editor, you can invoke inferencing directly from the editor, to do so, there would be an additional step:

  • Direct run mode

In this mode, the asset is a JSON coming from the user's input in a "conversational" format—not to be confused with labelbox.conversational. The model's conversational format could be enriched with additional assets of the types image, video, audio, PDF, etc.

Implement run_direct() method where you convert conversational message to the direct input and return raw response from the model