Documents

Guide for labeling document (PDF) data.

The Documents editor allows you to annotate and review PDF documents for use cases such as document review, text extraction, and data labeling for machine learning. You can navigate multi-page documents, manage annotation layers, and track progress efficiently.

Set up document annotation projects

To set up a document annotation project:

  1. Create a document (PDF) dataset.
  2. On the Annotate projects page, click the + New project button.
  3. Select Documents. Provide a name and an optional description for your project.
  4. Click Save. The system then creates the project and redirects you to the project overview page.
  5. Click Add data. Then select your audio dataset. Click Sample to sample your dataset, or you can manually select data rows and click Queue batch. To learn how to import documents using the SDK, see importing document data.

📘

Data row size limit

To view the maximum size allowed for a data row, see limits.

🚧

Image encoding

If your PDF files contain images, use JPEG encoding and RGB colorspace for color images.

Supported annotation types

Below are the annotation types you may include in your ontology for labeling document data. Classification-type annotations can be applied globally or nested within a bounding box or entity annotation.

FeatureImport annotationExport annotation
Bounding boxSee payloadSee payload
EntitySee payloadSee payload
RelationshipSee payloadSee payload
Radio classificationSee payloadSee payload
Checklist classificationSee payloadSee payload
Free-form text classificationSee payloadSee payload

All PDF documents support bounding box annotations. To create other annotations, PDF documents must have text boxes before you upload them to Catalog. For best results, verify text layers before uploading PDFs.

Bounding box

To create a bounding box, use your cursor to create the shape around a character, word(s), or section in the document. To reposition the bounding box on the document, simply click + hold, then use your mouse or trackpad to reposition the annotation on the document. You can also click + drag the corners to resize the bounding box.

1262

Entity

To create an entity annotation, click the desired starting character and drag to select a sequence of characters in the text. Characters are not restricted to a single class; entity annotations may overlap completely or partially. Entities may also span multiple pages. To edit an entity's class, right-click the entity and select Change class.

Shortcut: In the Tools panel, you will see a numerical hotkey next to the name of the annotation. Use the specified number hotkey (e.g., 1, 2, 3) to activate the entity tool.

To create another entity, press the number hotkey again to activate the tool, then create another entity. Once all entities have been created, press E to submit your label.

Token selection

We also support tokenization, so you can create and highlight entities at both word and character levels, which is determined by the data in your JSON upload.

Clicking on a specific word will highlight the entire word. This is helpful when labeling text, as it can be easy to accidentally miss certain characters or words when highlighting.

Relationships

To create a relationship between annotations, select a relationship tool and hover over the source annotation of the relationship to reveal the annotation's anchor points. Click an anchor point to create the starting point of the relationship, then bring your mouse over to the annotation you want to relate it to, hovering over it to reveal its anchor points. Finally, click one of the anchor points to complete the relationship.

Right-click a relationship to change its direction, make it bi-directional, or delete it altogether.

Relationships for annotations across pages

If you want to create an annotation relationship for annotations that exist on different pages, you will need to follow a slightly different workflow:

  1. Select the relationship tool.
  2. Go to the annotation where you want to start the relationship, right-click, and click Select relationship start.
  3. Scroll to your destination annotation, right-click, and click Select relationship end.

After you have selected both the starting and end annotation of the relationship, your relationship will be established.

Radio classification

Create a radio classification by activating the classification question and inputting the answer value. In the below example, press 8, k, and esc to complete the radio classification.

Once all classifications have been completed, press e to submit your label.

1262

Checklist classification

Create a checklist classification by activating the classification question and inputting the answer value(s). In the below example, pressing 7 and pressing Down + Enter on the answer values completes the checklist classification.

Once all classifications have been completed, press e to submit your label.

1262

Free text classification

Create a free text classification by activating the classification question and inputting the answer value. In the below example, pressing 6, typing the answer value, and pressing Enter completes the free text classification.

Once all classifications have been completed, press e to submit your label.

1262

Custom text layers

A unique aspect of our document editor is the ability to view text layers. You can toggle the text layer on and it will appear whenever you want to highlight an entity.

Navigate the document

Use your mouse scroll wheel or trackpad to move forward and backward through the pages of the document. To jump to a specific page, highlight the current page number in the top navigation bar, type your desired page number, and press Enter.

To zoom in, press Z and click on the section of the page you want to zoom in on.
To zoom out, press Opt + Z and click on the page, or press Shift + Z to return the page to its original zoom level.

Document-specific hotkeys

FunctionHotkeyDescription
Show Text LayerShift + TShow or hide the text layer.