Import document data

How to import document data and sample import formats.

Specifications

Format: PDF
Recommended size: 100 pages or fewer
Import methods:

  • IAM Delegated Access
  • Signed URLs (https URLs only)

When importing document data to Labelbox, you are no longer required to provide an OCR extract in the form of a JSON file. Labelbox generates text layers automatically during PDF import using Google Document AI if the data row doesn't include a text layer. The JSON file generated will be your text layer, rendered on top of your PDF in the Document Editor.

Note: Previously generated PDF documents without text layers will not be retroactively filled with the text layer generated by Labelbox.

Google Document AI has the following limitations:

  • The document must have no more than 15 pages
  • The file size should not exceed 20 MB.

Additionally, Google Document AI optimizes documents before OCR processing. This optimization might include rotating images or pages to ensure text appears horizontally. Consequently, token coordinates are calculated based on the rotated/optimized images, resulting in potential discrepancies with the original PDF document.

For example, the document is rotated 90 degrees before processing in a landscape-oriented PDF. As a result, all tokens in the text layer are also rotated by 90 degrees.

Text Layer Validation Schema

If you want to upload your own text layer, the textLayer JSON file must adhere to the following JSON schema.

{
  "type": "array",
  "items": {
    "$ref": "#/$defs/page"
  },
  "$defs": {
    "page": {
      "type": "object",
      "properties": {
        "width": {
          "type": "number"
        },
        "height": {
          "type": "number"
        },
        "number": {
          "type": "number"
        },
        "units": {
          "enum": ["POINTS", "PERCENT"]
        },
        "groups": {
          "type": "array",
          "items": {
            "$ref": "#/$defs/group"
          }
        }
      },
      "required": ["number", "units", "groups"]
    },
    "group": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string"
        },
        "content": {
          "type": "string"
        },
        "geometry": {
          "$ref": "#/$defs/geometry"
        },
        "tokens": {
          "type": "array",
          "items": {
            "$ref": "#/$defs/token"
          }
        }
      },
      "required": ["id", "content", "geometry", "tokens"]
    },
    "geometry": {
      "type": "object",
      "properties": {
        "left": {
          "type": "number"
        },
        "top": {
          "type": "number"
        },
        "width": {
          "type": "number"
        },
        "height": {
          "type": "number"
        }
      },
      "required": ["left", "top", "width", "height"]
    },
    "token": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string"
        },
        "content": {
          "type": "string"
        },
        "geometry": {
          "$ref": "#/$defs/geometry"
        }
      },
      "required": ["id", "geometry", "content"]
    }
  }
}
[
    {
        "width": 1601,
        "height": 2498,
        "number": 1,
        "units": "PERCENT",
        "groups": [
            {
                "id": "b4f4e1da-4088-44b3-a578-22f88ce9e166",
                "content": "ΑΝ",
                "geometry": {
                    "left": 0.4846970736980438,
                    "top": 0.17333866655826569,
                    "width": 0.028107434511184692,
                    "height": 0.007205769419670105
                },
                "tokens": [
                    {
                        "id": "0791adf8-80e9-4d3d-9d37-b3ad42dd061e",
                        "content": "ΑΝ",
                        "geometry": {
                            "left": 0.4846970736980438,
                            "top": 0.17333866655826569,
                            "width": 0.028107434511184692,
                            "height": 0.007205769419670105
                        }
                    }
                ]
            },
            {
                "id": "6f7c7e45-b9e8-4845-8c4e-63915e4a2e3d",
                "content": "ESSAY",
                "geometry": {
                    "left": 0.43410369753837585,
                    "top": 0.22377902269363403,
                    "width": 0.1274203360080719,
                    "height": 0.014811843633651733
                },
                "tokens": [
                    {
                        "id": "5adfb30a-235a-41c0-9902-a930cad660fc",
                        "content": "ESSAY",
                        "geometry": {
                            "left": 0.43410369753837585,
                            "top": 0.22377902269363403,
                            "width": 0.1274203360080719,
                            "height": 0.014811843633651733
                        }
                    }
                ]
            },
            {
                "id": "beec4d6e-cc9f-45e8-858c-c6c3f7c3ae49",
                "content": "ON THE",
                "geometry": {
                    "left": 0.46283572912216187,
                    "top": 0.2810248136520386,
                    "width": 0.07307934761047363,
                    "height": 0.006805449724197388
                },
                "tokens": [
                    {
                        "id": "a9456ab4-bfd0-49f2-a10b-c0ad2fa2fbb4",
                        "content": "ON",
                        "geometry": {
                            "left": 0.46283572912216187,
                            "top": 0.2810248136520386,
                            "width": 0.0237351655960083,
                            "height": 0.006405144929885864
                        }
                    },
                    {
                        "id": "fbdb497f-8241-406c-a06e-87027fb9b0b2",
                        "content": "THE",
                        "geometry": {
                            "left": 0.4971892535686493,
                            "top": 0.2810248136520386,
                            "width": 0.038725823163986206,
                            "height": 0.006405144929885864
                        }
                    }
                ]
            },
            {
                "id": "4caeda09-c92e-4c47-bb7a-3aea91a7dc29",
                "content": "PRINCIPLE OF POPULATION,",
                "geometry": {
                    "left": 0.2841973900794983,
                    "top": 0.31545236706733704,
                    "width": 0.425983726978302,
                    "height": 0.013210564851760864
                },
                "tokens": [
                    {
                        "id": "493ef02d-8c10-45cb-a889-10abc907a30c",
                        "content": "PRINCIPLE",
                        "geometry": {
                            "left": 0.2841973900794983,
                            "top": 0.31545236706733704,
                            "width": 0.15990003943443298,
                            "height": 0.013210564851760864
                        }
                    },
                    {
                        "id": "23ae78f7-9c04-43cb-b7a0-180806ec4472",
                        "content": "OF",
                        "geometry": {
                            "left": 0.4609619081020355,
                            "top": 0.31545236706733704,
                            "width": 0.036851972341537476,
                            "height": 0.013210564851760864
                        }
                    },
                    {
                        "id": "33462d2b-af22-456e-bd3b-5ed03ca091f3",
                        "content": "POPULATION",
                        "geometry": {
                            "left": 0.5115552544593811,
                            "top": 0.31545236706733704,
                            "width": 0.19300436973571777,
                            "height": 0.013210564851760864
                        }
                    },
                    {
                        "id": "f989c150-075b-4ea8-ac30-3cb8bd5697a7",
                        "content": ",",
                        "geometry": {
                            "left": 0.7033104300498962,
                            "top": 0.31545236706733704,
                            "width": 0.006870687007904053,
                            "height": 0.013210564851760864
                        }
                    }
                ]
            },
            {
                "id": "be9c8936-10b8-4013-8259-b34d309b1ce9",
                "content": "AS IT AFFECTS",
                "geometry": {
                    "left": 0.42910680174827576,
                    "top": 0.3570856750011444,
                    "width": 0.13991257548332214,
                    "height": 0.006805449724197388
                },
                "tokens": [
                    {
                        "id": "3af460fb-30cf-4988-82a3-4bdd6814d37e",
                        "content": "AS",
                        "geometry": {
                            "left": 0.42910680174827576,
                            "top": 0.3570856750011444,
                            "width": 0.02123674750328064,
                            "height": 0.006805449724197388
                        }
                    },
                    {
                        "id": "56b8b804-d802-48ca-af12-9fca0f16fe22",
                        "content": "IT",
                        "geometry": {
                            "left": 0.4603372812271118,
                            "top": 0.3570856750011444,
                            "width": 0.018113672733306885,
                            "height": 0.006805449724197388
                        }
                    },
                    {
                        "id": "29c69737-a6ff-48c6-b09d-fa6b04739c91",
                        "content": "AFFECTS",
                        "geometry": {
                            "left": 0.4890693426132202,
                            "top": 0.3570856750011444,
                            "width": 0.07995003461837769,
                            "height": 0.006805449724197388
                        }
                    }
                ]
            },
            {
                "id": "e0334ded-e13b-4b56-a53e-e8671eae6871",
                "content": "THE FUTURE IMPROVEMENT OF SOCIETY.",
                "geometry": {
                    "left": 0.21549031138420105,
                    "top": 0.3995196223258972,
                    "width": 0.5671455562114716,
                    "height": 0.010008007287979126
                },
                "tokens": [
                    {
                        "id": "fbf57954-5e06-4a78-8d62-f012d18683d7",
                        "content": "THE",
                        "geometry": {
                            "left": 0.21549031138420105,
                            "top": 0.3995196223258972,
                            "width": 0.05496564507484436,
                            "height": 0.010008007287979126
                        }
                    },
                    {
                        "id": "43be97d3-1179-4ea9-b07d-aa0e369f770f",
                        "content": "FUTURE",
                        "geometry": {
                            "left": 0.2841973900794983,
                            "top": 0.3995196223258972,
                            "width": 0.1043097972869873,
                            "height": 0.010008007287979126
                        }
                    },
                    {
                        "id": "57d3e552-8fae-466a-a090-5c3f1245c803",
                        "content": "IMPROVEMENT",
                        "geometry": {
                            "left": 0.4022485911846161,
                            "top": 0.3995196223258972,
                            "width": 0.20112428069114685,
                            "height": 0.010008007287979126
                        }
                    },
                    {
                        "id": "b2f2323b-6e5b-41ab-80c8-4c4dc3d7d1d5",
                        "content": "OF",
                        "geometry": {
                            "left": 0.6171143054962158,
                            "top": 0.3995196223258972,
                            "width": 0.031230449676513672,
                            "height": 0.010008007287979126
                        }
                    },
                    {
                        "id": "1284a561-2359-4658-9a23-1957ea9c34e0",
                        "content": "SOCIETY",
                        "geometry": {
                            "left": 0.6627107858657837,
                            "top": 0.3995196223258972,
                            "width": 0.11242973804473877,
                            "height": 0.010008007287979126
                        }
                    },
                    {
                        "id": "1e91b8d6-668f-4d25-bf30-75c4478ae33e",
                        "content": ".",
                        "geometry": {
                            "left": 0.7763897776603699,
                            "top": 0.3995196223258972,
                            "width": 0.006246089935302734,
                            "height": 0.010008007287979126
                        }
                    }
                ]
            },
            {
                "id": "027c0a96-c979-47de-8fca-44d0b07c3ac3",
                "content": "WITH REMARKS",
                "geometry": {
                    "left": 0.42410993576049805,
                    "top": 0.44515612721443176,
                    "width": 0.14990627765655518,
                    "height": 0.0064051151275634766
                },
                "tokens": [
                    {
                        "id": "60143ecf-ae4f-4635-a1e6-01fca29d286c",
                        "content": "WITH",
                        "geometry": {
                            "left": 0.42410993576049805,
                            "top": 0.44515612721443176,
                            "width": 0.05059337615966797,
                            "height": 0.0064051151275634766
                        }
                    },
                    {
                        "id": "daa24689-add8-4de3-9b58-36d6ea8b871f",
                        "content": "REMARKS",
                        "geometry": {
                            "left": 0.48657089471817017,
                            "top": 0.44515612721443176,
                            "width": 0.08744531869888306,
                            "height": 0.0064051151275634766
                        }
                    }
                ]
            },
            {
                "id": "6f9dac4a-4f7e-4409-bbaa-205817a45880",
                "content": "ON THE SPECULATIONS OF MR. GODWIN,",
                "geometry": {
                    "left": 0.2804497182369232,
                    "top": 0.47718173265457153,
                    "width": 0.43847593665122986,
                    "height": 0.011208981275558472
                },
                "tokens": [
                    {
                        "id": "b4828d24-46a6-4530-ae0f-dc6290f5d34d",
                        "content": "ON",
                        "geometry": {
                            "left": 0.2804497182369232,
                            "top": 0.47758206725120544,
                            "width": 0.027482837438583374,
                            "height": 0.009207367897033691
                        }
                    },
                    {
                        "id": "472a19ce-d3b9-4fd2-8d2f-645a1b1ce6da",
                        "content": "THE",
                        "geometry": {
                            "left": 0.31980013847351074,
                            "top": 0.47758206725120544,
                            "width": 0.042473435401916504,
                            "height": 0.009207367897033691
                        }
                    },
                    {
                        "id": "ce03974a-0d54-43b0-8ded-78f67732e070",
                        "content": "SPECULATIONS",
                        "geometry": {
                            "left": 0.37289193272590637,
                            "top": 0.47758206725120544,
                            "width": 0.15427860617637634,
                            "height": 0.010008007287979126
                        }
                    },
                    {
                        "id": "0d9584f4-c1cd-4acf-b8b1-2bb2a812f344",
                        "content": "OF",
                        "geometry": {
                            "left": 0.5377888679504395,
                            "top": 0.4783827066421509,
                            "width": 0.02560901641845703,
                            "height": 0.009207367897033691
                        }
                    },
                    {
                        "id": "6d44ba8a-f037-4762-83e7-864d3d91293b",
                        "content": "MR",
                        "geometry": {
                            "left": 0.5740162134170532,
                            "top": 0.4783827066421509,
                            "width": 0.031855106353759766,
                            "height": 0.009207367897033691
                        }
                    },
                    {
                        "id": "6aa31fc2-ca8f-49b8-b640-75d1ba9c14b5",
                        "content": ".",
                        "geometry": {
                            "left": 0.6071205735206604,
                            "top": 0.4787830412387848,
                            "width": 0.004372239112854004,
                            "height": 0.009207338094711304
                        }
                    },
                    {
                        "id": "90d91952-9233-47f2-aefd-ec4a446a51f0",
                        "content": "GODWIN",
                        "geometry": {
                            "left": 0.6233603954315186,
                            "top": 0.4783827066421509,
                            "width": 0.0855715274810791,
                            "height": 0.010008007287979126
                        }
                    },
                    {
                        "id": "a5719dea-8694-4ed6-97ad-eea86a92c0ea",
                        "content": ",",
                        "geometry": {
                            "left": 0.712054967880249,
                            "top": 0.4791833460330963,
                            "width": 0.006870687007904053,
                            "height": 0.009207367897033691
                        }
                    }
                ]
            },
            {
                "id": "8b5a88c6-c39e-4f8d-88f2-30eaa7c3f935",
                "content": "M. CONDORCET,",
                "geometry": {
                    "left": 0.41474080085754395,
                    "top": 0.5148118734359741,
                    "width": 0.1698938012123108,
                    "height": 0.01080864667892456
                },
                "tokens": [
                    {
                        "id": "63b99375-0adc-461b-9539-0d907eebac3b",
                        "content": "M.",
                        "geometry": {
                            "left": 0.41474080085754395,
                            "top": 0.5152121782302856,
                            "width": 0.02435976266860962,
                            "height": 0.009207367897033691
                        }
                    },
                    {
                        "id": "2a2f1ef8-6018-41f7-9b06-9f957313251b",
                        "content": "CONDORCET",
                        "geometry": {
                            "left": 0.4509681463241577,
                            "top": 0.5152121782302856,
                            "width": 0.12679576873779297,
                            "height": 0.010408341884613037
                        }
                    },
                    {
                        "id": "0623b165-4bb6-4ac7-b115-5989dab0d188",
                        "content": ",",
                        "geometry": {
                            "left": 0.5771392583847046,
                            "top": 0.516413152217865,
                            "width": 0.0074953436851501465,
                            "height": 0.008807003498077393
                        }
                    }
                ]
            },
            {
                "id": "4f5337fe-082a-4937-a08f-8c3304b84c7a",
                "content": "AND OTHER WRITERS.",
                "geometry": {
                    "left": 0.3791380524635315,
                    "top": 0.5536429286003113,
                    "width": 0.23860085010528564,
                    "height": 0.0072057247161865234
                },
                "tokens": [
                    {
                        "id": "ba603d0c-b146-4d01-9e1f-ad47ce600c29",
                        "content": "AND",
                        "geometry": {
                            "left": 0.3791380524635315,
                            "top": 0.5536429286003113,
                            "width": 0.04434725642204285,
                            "height": 0.0072057247161865234
                        }
                    },
                    {
                        "id": "3bdbfa39-0418-4758-a855-1104a8d8fb02",
                        "content": "OTHER",
                        "geometry": {
                            "left": 0.4347282946109772,
                            "top": 0.5536429286003113,
                            "width": 0.07245472073554993,
                            "height": 0.0072057247161865234
                        }
                    },
                    {
                        "id": "2a729df2-5199-45dd-bdf6-4787b95efe65",
                        "content": "WRITERS",
                        "geometry": {
                            "left": 0.5165521502494812,
                            "top": 0.5536429286003113,
                            "width": 0.09369146823883057,
                            "height": 0.0072057247161865234
                        }
                    },
                    {
                        "id": "40924cb4-fc4f-45a2-b4f7-1c31368f243b",
                        "content": ".",
                        "geometry": {
                            "left": 0.612742006778717,
                            "top": 0.5536429286003113,
                            "width": 0.004996895790100098,
                            "height": 0.0072057247161865234
                        }
                    }
                ]
            },
            {
                "id": "75a050e5-46d5-4773-9909-0bc2309892da",
                "content": "LONDON:",
                "geometry": {
                    "left": 0.45284196734428406,
                    "top": 0.6016813516616821,
                    "width": 0.09306684136390686,
                    "height": 0.00840669870376587
                },
                "tokens": [
                    {
                        "id": "cea98dbb-45d8-4562-8dee-a965cee86db7",
                        "content": "LONDON",
                        "geometry": {
                            "left": 0.45284196734428406,
                            "top": 0.6016813516616821,
                            "width": 0.08432230353355408,
                            "height": 0.00840669870376587
                        }
                    },
                    {
                        "id": "92b73d06-bae1-4a32-9970-e6cbc99ee74f",
                        "content": ":",
                        "geometry": {
                            "left": 0.5402873158454895,
                            "top": 0.6024819612503052,
                            "width": 0.005621492862701416,
                            "height": 0.007606089115142822
                        }
                    }
                ]
            },
            {
                "id": "c31fd2a2-f11e-4b8e-8b84-8592bd65b235",
                "content": "PRINTED FOR J. JOHNSON, IN ST. PAUL'S",
                "geometry": {
                    "left": 0.30605870485305786,
                    "top": 0.6381104588508606,
                    "width": 0.38538414239883423,
                    "height": 0.008406758308410645
                },
                "tokens": [
                    {
                        "id": "920b4d6d-150a-4e53-b380-315190fc3c96",
                        "content": "PRINTED",
                        "geometry": {
                            "left": 0.30605870485305786,
                            "top": 0.6381104588508606,
                            "width": 0.08119925856590271,
                            "height": 0.00960773229598999
                        }
                    },
                    {
                        "id": "356a78e0-ebd7-44a0-8f04-015d3fe3527b",
                        "content": "FOR",
                        "geometry": {
                            "left": 0.3978763222694397,
                            "top": 0.6377101540565491,
                            "width": 0.03560274839401245,
                            "height": 0.010008037090301514
                        }
                    },
                    {
                        "id": "b2ddf115-27c1-4078-b96c-a80464d54c2a",
                        "content": "J.",
                        "geometry": {
                            "left": 0.4409743845462799,
                            "top": 0.6377101540565491,
                            "width": 0.012492209672927856,
                            "height": 0.009607672691345215
                        }
                    },
                    {
                        "id": "d25272ac-5ae1-4510-9778-1f8c72613960",
                        "content": "JOHNSON",
                        "geometry": {
                            "left": 0.46283572912216187,
                            "top": 0.6377101540565491,
                            "width": 0.08432227373123169,
                            "height": 0.009607672691345215
                        }
                    },
                    {
                        "id": "79866cbf-139e-4b14-805b-6e8f8ea1c8cb",
                        "content": ",",
                        "geometry": {
                            "left": 0.5496564507484436,
                            "top": 0.6373098492622375,
                            "width": 0.004372298717498779,
                            "height": 0.009607672691345215
                        }
                    },
                    {
                        "id": "dcddbbd2-b954-48ff-91b2-a323310724a2",
                        "content": "IN",
                        "geometry": {
                            "left": 0.5652716755867004,
                            "top": 0.6373098492622375,
                            "width": 0.01873832941055298,
                            "height": 0.009607672691345215
                        }
                    },
                    {
                        "id": "46dbb8a6-dc00-4a7b-8703-aec10245d262",
                        "content": "ST",
                        "geometry": {
                            "left": 0.5946283340454102,
                            "top": 0.6373098492622375,
                            "width": 0.019362926483154297,
                            "height": 0.009607672691345215
                        }
                    },
                    {
                        "id": "e6ef7add-c8c1-4f18-8bfc-55cc3a810449",
                        "content": ".",
                        "geometry": {
                            "left": 0.6164897084236145,
                            "top": 0.6373098492622375,
                            "width": 0.004372239112854004,
                            "height": 0.009607672691345215
                        }
                    },
                    {
                        "id": "4a743990-3683-4017-9c62-4e6cae7513a2",
                        "content": "PAUL'S",
                        "geometry": {
                            "left": 0.6308557391166687,
                            "top": 0.636909544467926,
                            "width": 0.06058710813522339,
                            "height": 0.009607672691345215
                        }
                    }
                ]
            },
            {
                "id": "47f38d95-c06c-47e3-8c4e-e7db108ad5c8",
                "content": "CHURCH-YARD",
                "geometry": {
                    "left": 0.4303560256958008,
                    "top": 0.6717373728752136,
                    "width": 0.13616490364074707,
                    "height": 0.006004810333251953
                },
                "tokens": [
                    {
                        "id": "9805d04e-d122-48cc-8ed3-ea49ec5fd231",
                        "content": "CHURCH",
                        "geometry": {
                            "left": 0.4303560256958008,
                            "top": 0.6717373728752136,
                            "width": 0.076202392578125,
                            "height": 0.006004810333251953
                        }
                    },
                    {
                        "id": "6949f56c-30db-484c-99ad-8c3f15d4c7f2",
                        "content": "-",
                        "geometry": {
                            "left": 0.5103060603141785,
                            "top": 0.6717373728752136,
                            "width": 0.008119940757751465,
                            "height": 0.006004810333251953
                        }
                    },
                    {
                        "id": "b17eee41-1697-4065-ae24-757f35a79178",
                        "content": "YARD",
                        "geometry": {
                            "left": 0.5190505981445312,
                            "top": 0.6717373728752136,
                            "width": 0.0474703311920166,
                            "height": 0.006004810333251953
                        }
                    }
                ]
            },
            {
                "id": "87f29527-2a25-4ec9-b946-e2b3005758c2",
                "content": "1798.",
                "geometry": {
                    "left": 0.46221113204956055,
                    "top": 0.7233787178993225,
                    "width": 0.07432854175567627,
                    "height": 0.014011204242706299
                },
                "tokens": [
                    {
                        "id": "e6ee28b1-65a5-4926-938d-6548045bb9b2",
                        "content": "1798",
                        "geometry": {
                            "left": 0.46221113204956055,
                            "top": 0.723779022693634,
                            "width": 0.06433475017547607,
                            "height": 0.013610899448394775
                        }
                    },
                    {
                        "id": "e575c04a-33d5-4332-bfb0-9f208632108b",
                        "content": ".",
                        "geometry": {
                            "left": 0.5302935838699341,
                            "top": 0.723779022693634,
                            "width": 0.006246089935302734,
                            "height": 0.013210594654083252
                        }
                    }
                ]
            }
        ]
    }
]

Parameters

ParameterRequiredDescription
row_dataYesA dictionary of
{ "pdf_url": str, "text_layer_url": str }

For IAM Delegated Access, this URL must be in virtual-hosted-style format.
row_data['pdf_url']Yeshttps path to a cloud-hosted PDF. It must be specified within row_data dictionary.
row_data['text_layer_url']Nohttps path to a cloud-hosted JSON extract of the PDF.
global_keyNoUnique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if its global keys are duplicated to existing data rows.
media_typeNo"PDF" (optional media type to provide better validation and error messaging)
metadata_fieldsNoSee Metadata.
attachmentsNoSee Attachments and Asset overlays.

Import format

[
  {
    "row_data": {
      "pdf_url": "https://lb-test-data.s3.us-west-1.amazonaws.com/document-samples/0801.3483.pdf",
      // You don't need to provide a text_layer_url. Labelbox automatically generates a text layer when importing an asset without one.
      "text_layer_url": "https://lb-test-data.s3.us-west-1.amazonaws.com/document-samples/0801.3483-lb-textlayer.json"
    },
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/document-samples/0801.3483.pdf",
    "media_type": "PDF",
    "metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
    "attachments": [{"type": "HTML", "value": "https://www.wikipedia.org/" }]
  },
  {
    "row_data": {
      "pdf_url": "https://lb-test-data.s3.us-west-1.amazonaws.com/document-samples/0803.1972.pdf",
       // You don't need to provide a text_layer_url. Labelbox automatically generates a text layer when importing an asset without one.
      "text_layer_url": "https://lb-test-data.s3.us-west-1.amazonaws.com/document-samples/0803.1972-lb-textlayer.json"
    },
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/document-samples/0803.1972.pdf",
    "media_type": "PDF",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  }
]
[
  {
    "row_data": {
      "pdf_url": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf",
      // You don't need to provide a text_layer_url. Labelbox automatically generates a text layer when importing an asset without one.
      "text_layer_url": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483-lb-textlayer.json"
    },
    "global_key": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf",
    "media_type": "PDF",
    "metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
    "attachments": [{"type": "HTML", "value": "https://www.wikipedia.org/" }]
  }
]

Python example

from labelbox import Client
from uuid import uuid4 ## to generate unique IDs
import datetime 

client = Client(api_key="<YOUR_API_KEY>")

dataset = client.create_dataset(name="Bulk import example - Documents")

assets = [
  {
    "row_data": {
      "pdf_url": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf",
		# You don't need to provide a text_layer_url. Labelbox automatically generates a text layer when importing an asset without one.
      "text_layer_url": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483-lb-textlayer.json"
    },
    "global_key": "https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf",
    "media_type": "PDF",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "HTML", "value": "https://www.wikipedia.org/" }]
  }
]

task = dataset.create_data_rows(assets)
task.wait_till_done()
print(task.errors)
local_file_paths = ['path/to/local/file1', 'path/to/local/file1'] # limit: 15k files


new_dataset = client.create_dataset(name = "Local files upload")

try:
    task = new_dataset.create_data_rows(local_file_paths)
    task.wait_till_done()
except Exception as err:
    print(f'Error while creating labelbox dataset -  Error: {err}')

Verify files are processed

🚧

File processing can take up to 20 mins

Since PDFs and OCR'ed files can be very large, the conversion can sometimes take up to 20 minutes to perform a data upload.

By checking the Media Attributes section, you can verify whether a file conversion using a custom or Labelbox-generated text layer is complete.

  • If Is text layer valid = true, the file was successfully processed.