Documents

Guide for labeling document (PDF) data.

Overview

When you attach a document dataset to a project, Labelbox will automatically adjust the editor interface for document labeling.

Supported annotation types

Below are the annotation types that you may include in your ontology for labeling document data. Classification-type annotations can be applied at the global level and/or nested within a bounding box annotation.

Import annotationExport annotation
Bounding boxSee payloadSee payload
Text Entity (Named Entity)See payloadSee payload
RelationshipComing SoonSee payload
Radio classificationSee payloadSee payload
Checklist classificationSee payloadSee payload
Free-form text classificationSee payloadSee payload

Custom Text Layer

A unique aspect of our document editor is being able to view your text layer. You can toggle the text layer on - and it will appear any time you want to highlight an entity.

Exporting raw text

To export raw data alongside the entity labels in a PDF project, you can toggle on the Save and export raw text option at the time of project creation. This option automatically pops up for our Document data type. Click the toggle to turn Save and export data on. With this on, the named entities will be exported with your text file.
Note: This feature only works if you upload a textlayer with your PDF.

Navigate the document

Use your mouse scroll wheel or trackpad to move forward and backward through the pages of the document. To jump to a specific page, highlight the current page number in the top navigation bar, type your desired page number, and press Enter.

To zoom in, press Z and click on the section of the page you want to zoom in on.
To zoom out, press Opt + Z and click on the page, or press Shift + Z to return the page to its original zoom level.

Bounding box

To create a bounding box, use your cursor to create the shape around a character, word(s), or section in the document. To reposition the bounding box on the document, simply click + hold then use your mouse or trackpad to reposition the annotation on the document. You can also click + drag the corners to resize the bounding box.

Shortcut: In the Tools panel, you will see a numerical hotkey next to the name of the annotation. Use the specified number hotkey (e.g., 1, 2, 3) to activate the bounding box tool.

To create another instance of the bounding box, press the number hotkey again to activate the tool, then create another bounding box. Once all instances have been created, press E to submit your label.

1262

Entity

To create an entity annotation, click the desired starting character and drag to select a sequence of characters in the text. Characters are not restricted to a single class; entity annotations may overlap completely or partially. Entities may also span multiple pages. To edit an entity's class, right-click the entity and select Change class.

Shortcut: In the Tools panel, you will see a numerical hotkey next to the name of the annotation. Use the specified number hotkey (e.g., 1, 2, 3) to activate the entity tool.

To create another entity, press the number hotkey again to activate the tool, then create another entity. Once all entities have been created, press E to submit your label.

Token Selection

We also support tokenization, so you can create and highlight entities at both word level and character level - this is determined by the data contained in your JSON upload. Clicking on a specific word will highlight the entire word. This is helpful when labeling text, as it can be easy to accidentally miss certain characters or words when highlighting text.

Relationships

To create a relationship between annotations, select a Relationship tool and hover over the annotation where you want the relationship to start to reveal the annotation's anchor points. Click an anchor point to create the starting point of the relationship, then bring your mouse over to the annotation you want to relate it to, hovering over it to reveal its anchor points. Finally, click one of the anchor points to complete the relationship.

Right-click a relationship to change its direction, make it bi-directional, or delete it from the asset.

Relationships for annotations across pages

If you want to create an annotation relationship for annotations that exist on different pages, you will need to follow a slightly different workflow:

  1. Select the annotation relationship tool
  2. Go to the annotation where you want to start the relationship, right-click, and click Select relationship start
  3. Scroll to your destination annotation tool, right-click, and click Select relationship end

After you have selected both the starting and end point of the relationship, your relationship will be established.

Radio classification

Create a radio classification by activating the classification question and inputting the answer value. In the below example, press 8, k, and esc to complete the radio classification.

Once all classifications have been completed, press e to submit your label.

1262

Checklist classification

Create a checklist classification by activating the classification question and inputting the answer value(s). In the below example, pressing 7 and pressing Down + Enter on the answer values completes the checklist classification.

Once all classifications have been completed, press e to submit your label.

1262

Free text classification

Create a free text classification by activating the classification question and inputting the answer value. In the below example, pressing 6, typing the answer value, and pressing Enter completes the free text classification.

Once all classifications have been completed, press e to submit your label.

1262

Document-specific hotkeys

FucntionHotkeyDescription
Show Text LayerShift + TShow or hide the text layer.