Labelbox documentation

Text (NER) editor

Natural Language Processing (NLP) is an area of research and application that explores how to use computers to “understand” and manipulate natural language, such as text or speech. Most NLP techniques rely on machine learning to derive meaning from human languages. One of NLP’s methodologies for processing natural language is text classification, a method that leverages deep learning to categorize sequences of unstructured text.

Named Entity Recognition (NER) is a subtask of information extraction whereby entities in the unstructured text are classified into pre-determined categories.

You can use the Labelbox Editor to create NER training data for your ML model by labeling sequences of characters in your text file with the Entity annotation.

Below are all of the annotation types you may include in your ontology when you are labeling text data. Classification-type annotations can be applied at the global level and/or nested within an Object-type annotation.

Annotation type

Import format

Export format

Entity (NER)

See JSON

See JSON

Radio classification (global + nested)

-

See JSON

Checklist classification (global + nested)

-

See JSON

Dropdown classification (global + nested)

-

See JSON

Free-form text classification (global + nested)

-

See JSON

How the Entity annotation works

When you load a text file into the Editor, you can use the Entity annotation to label sequences of characters in the unstructured text. The characters in your text file are not restricted to a single Entity annotation, meaning Entity annotations can overlap.

entity-tool.png

When you export your NER annotations from Labelbox, each annotation in the export contains location.start and a location.end information to indicate which characters in the unstructured text are included in each Entity annotation. See the Data model reference for an example.

  • The value for location.start indicates the index of the first character in the Entity annotation and it assumes start-index inclusion.

  • The value for location.end indicates the index of the last character in the Entity annotation and assumes end-index exclusion.

You can also nest classification-type annotations within an Entity annotation. Nested classifications for text is supported for all classification types (see section above).