Documentation Index
Fetch the complete documentation index at: https://docs.labelbox.com/llms.txt
Use this file to discover all available pages before exploring further.
Specifications
Format: PDF Recommended size: 100 pages or fewer Import methods:- IAM Delegated Access
- Signed URLs (
httpsURLs only)
Text layer limit
Previously generated PDF documents without text layers can’t be retroactively filled with the text layer generated by Labelbox.- The document must have no more than 15 pages
- The file size should not exceed 20 MB.
Text Layer Validation Schema
If you want to upload your own text layer, the textLayer JSON file must adhere to the following JSON schema.Parameters
| Parameter | Required | Description |
|---|---|---|
row_data | Yes | A dictionary of{ "pdf_url": str, "text_layer_url": str } For IAM Delegated Access, this URL must be in virtual-hosted-style format. |
row_data['pdf_url'] | Yes | https path to a cloud-hosted PDF. It must be specified within row_data dictionary. |
row_data['text_layer_url'] | No | https path to a cloud-hosted JSON extract of the PDF. |
global_key | No | Unique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if its global keys are duplicated to existing data rows. |
media_type | No | "PDF" (optional media type to provide better validation and error messaging) |
metadata_fields | No | See Metadata. |
attachments | No | See Attachments and Asset overlays. |
Import format
Python example
Verify files are processed
By checking the Media Attributes section, you can verify whether a file conversion using a custom or Labelbox-generated text layer is complete.- If
Is text layer valid = true, the file was successfully processed.
