Census integration

Shows how to set up a Census integration that connects remote data to Labelbox, including relational databases (RDBMS), data warehouses, data lakes, and more.

You can use Census to integrate data from more than twenty (20) different data sources into Labelbox, including Databricks, Snowflake, Google BigQuery, MySQL, PostgreSQL, SQL Server, and more.

To do this, you use Census to create a sync between your remote data (source) and a Labelbox Catalog dataset (destination).

Here, we show how to set up a Census integration that syncs remote data with Labelbox.

How it works

The following diagram shows how things work:

A Census integration lets you integrate remote data into Labelbox.

Census integrations support different types of data operations (called sync behaviors) that control how data rows are imported into Labelbox:

Data operationDescription
Update or Create (also known as upsert)New global key IDs create new data rows while existing global keys replace earlier data rows.
Update onlyExisting records are replaced by duplicate IDs. No new data rows are created.
Create onlyAdds data rows for new global ID values; duplicated ID values are ignored.

The specific options available to a given sync depend on the source of the data. (Not every data source supports every operation.) To learn more, see Sync behaviors.

Before you begin

You need a few things to set up a Census integration:

  1. An active Census account (free).
  2. Credentials for accessing your source data.
  3. A destination Catalog dataset.
  4. A Labelbox API key.

For best results, take time to collect these details before starting the process.

Set up Census integration

Census integrations require three basic elements, including:

  • A Source that points to your data.
  • A Destination that points to Labelbox.
  • A Sync that synchronizes the source to the destination.

The following sections walk through each part of the process.

These steps provide high-level guidance. Census is a separate company and may change their service and app at any time. Use the Census docs for detailed help.

Source set up

To set up a Census Source:

  1. Sign in to Census.
  2. From the Connect section of the Census main menu, choose Sources and then select New Source.
  3. Follow the remaining prompts until you return to the Sources list.
  4. Locate your new Source and then select Test to verify the connection.

Each Source provides a link to detailed setup instructions. Use these and the test results to troubleshoot any problems.

Destination set up

Once your Census Source is set up and tested, you can set up your Labelbox destination.

You will need a Labelbox API key. For best results, obtain this before setting up the Destination.

To set up a Labelbox Destination:

  1. Sign in to Census.

  2. From the Connect section of the Census main menu, choose Destinations and then select New Destination.

  3. When prompted for the destination type, select Labelbox.

  4. When prompted to configure the connection, provide your Labelbox API key.

    Optionally, you can also change the name of your Destination (example: My custom Labelbox connection)

    When you finish configuring the Destination, select Connect.

    This tests your connection. If the connection fails, select Back and make sure that you're adding the correct API key and that it is valid.

  5. When the connection succeeds, select Finish to complete the process.

Sync set up

Once you have a Source and a Destination, it's time to create a Sync between them:

  1. Sign in to Census.

  2. From the Activate section of the Census main menu, choose Syncs and then select New Sync.

  3. From the Select a Source panel, choose your Source connection.

    If you're not sure which to choose, start with Select a Warehouse Table and then select the connection you added as your Source.

    If additional prompts appear, select the options appropriate for your setup.

  4. From the Select a Destination panel, select the connection you set up as your Destination.

    When prompted for the object, select DataRow.

  5. From the Select a Sync Behavior panel, choose a data operation.
    Available options depend on your data source. For help, see Sync Behaviors.

  6. Use the Select a Sync Key to map a column or field in your source to the Labelbox global key; choose a value that uniquely identifies each data row. This should be a value unique to your organization, such as a primary key, a globally unique identifier (GUID), or other unique value.

  7. Use the Set up Labelbox Field Mappings panel to map individual Source fields to required Labelbox fields.

    When importing data into a single Labelbox dataset, it's not necessary to dataset ID for every record in your source. Instead, use a Census constant value to define a dataset ID to be used for every row.

  8. In the Run a Test Sync panel, select Run Test to verify the sync. If problems occur, use error messages to troubleshoot the problem. You can also use the API Inspector to view underlying REST queries and responses.

    When satisfied with your sync setup, choose Next.

  9. The Summary page defines your sync's Run mode and Trigger. For help, see Triggering & Sceduling. Choose the options that best fit your needs and then select Create.

Your new sync should appear in the Syncs list. To learn more, see Running Syncs.

Once your sync has run successfully, you can sign in to Labelbox and manage and use your dataset like any other Labelbox dataset.

Structure data for ingestion

To successfully set up a Census integration with Labelbox, your data must have the following fields:

  • A global key that uniquely identifies individual data rows. Global keys (also called sync keys) help Labelbox distinguish between new records, changed records, or duplicate records. (Duplicate keys fail to load.)
    Global keys must be unique within your Labelbox workspace, generally an organization subscription.
  • A dataset ID to identify the data row destination.
    You can specify the dataset ID in the source data or as a constant value when setting up the sync.
    To find a dataset ID, use Catalog to open the dataset and then use the Dataset menu to copy the dataset ID to the Clipboard.
  • Row data specifying the location of a data asset

An example Google Sheet shows how to structure these values.

You can also include additional fields:

  • Metadata is a JSON object (dictionary) specifying custom metadata to be added to the data row. (Metadata is limited according to your subscription.)
    Example: [{ "name": "dog", "value": 123 }, { "name": "fox", "value": 123 }]
  • Attachments is a JSON object specifying attachments to be associated with the data row.
        "type": "RAW_TEXT",  
        "value": "IOWA, Zone 2232, June 2022 [Text string]"  
        "type": "IMAGE",
        "value": "https://storage.example.com/samples/attachment.jpeg"