Census integration
Shows how to set up a Census integration to connect remote data to Labelbox, including relational databases (RDBMS), data warehouses, data lakes, and more.
You can set up a Census integration to load remote data into your Labelbox datasets from various sources, including:
- Data warehouses such as Databricks, BigQuery, and Snowflake
- Storage buckets such as GCP, AWS, and Azure
- Databases such as MySQL and Postgres
Census availability
The Census integration feature is available exclusively for paid customers. Free users need to upgrade their subscription to access this feature.
For more information on upgrading your subscription, please see Plans & pricing.
Data requirements
Each entry in your data must have the following values for a successful integration:
- A global key that uniquely identifies individual data rows. Global keys (also called sync keys) help Labelbox distinguish between new records, changed records, or duplicate records. Global keys must be unique within your Labelbox workspace, generally an organization subscription.
- A dataset ID that identifies the data row destination. You can specify the dataset ID in the source data or as a constant value when setting up the sync.
To find a dataset ID, use Catalog to open the dataset and then use the Dataset menu to copy the dataset ID to the Clipboard. - A row data value that specifies the location of a data asset.
You can also include the following additional fields:
- A Metadata JSON object (dictionary) that specifies custom metadata to be added to the data row. A size limit applies based on your subscription.
Example:[{ "name": "dog", "value": 123 }, { "name": "fox", "value": 123 }]
- An Attachments JSON object that specifies attachments to be associated with the data row.
Example:[ { "type": "RAW_TEXT", "value": "IOWA, Zone 2232, June 2022 [Text string]" }, { "type": "IMAGE", "value": "https://storage.example.com/samples/attachment.jpeg" } ]
See this Google Sheet for an example on how to structure required and optional values.
Add integration
To add an integration between a data source and Labelbox:
- There are two ways you can access your integrations:
- From the Add data section on the app Home page.
- Under Workspace settings, select Integrations.
- To add a new integration, select Sync from a source.
- Select the data source that you want to connect to Labelbox.
- Follow the instructions to configure your integration settings.
You can check the connection status of your configured integrations under Manage integrations and start to sync integrated data to your datasets.
Sync integration to dataset
To sync data on an integrated data source to a dataset:
- On the Workspace settings page, select Integrations.
- Under Manage integrations, select Sync Integration.
- Select an existing dataset or create a new dataset for loading in your data.
- Under Select a Source, select Select a Warehouse Table. Then, under Connection, select your integration source.
- Keep the default settings of Select a Destination.
- Under Select a Sync Behavior, select the data operation type that controls how data rows are imported into Labelbox. The options are Create only, Update only, Update or Create, and Delete.
- Map your data according to the data requirements.
- (Optional) Run a test sync to verify the sync behavior.
- Select a Run Mode to manually trigger the sync or set up a pattern that automates the data sync between your source and Labelbox.
Once your sync has run successfully, you can manage and use your dataset with synced data like any other Labelbox dataset.
Updated 11 days ago