When you attach a text dataset to a project, the Labelbox Editor will automatically adjust the Editor interface for text labeling.
Below are all of the annotation types you may include in your ontology when you are labeling text data. Classification-type annotations can be applied at the global level and/or nested within an Object-type annotation.
Natural Language Processing (NLP) is an area of research and application that explores how to use computers to “understand” and manipulate natural language, such as text or speech. Most NLP techniques rely on machine learning to derive meaning from human languages. One of NLP’s methodologies for processing natural language is text classification, a method that leverages deep learning to categorize sequences of unstructured text.
Named Entity Recognition (NER) is a subtask of information extraction whereby entities in the unstructured text are classified into pre-determined categories. You can use the Labelbox Editor to create NER training data for your ML model by labeling sequences of characters in your text file with the Entity annotation.
When you load a text file into the Editor, you can use the Entity annotation to label sequences of characters in the unstructured text. The characters in your text file are not restricted to a single Entity annotation, meaning Entity annotations can overlap.
When you export your NER annotations from Labelbox, each annotation in the export contains
location.start and a
location.end information to indicate which characters in the unstructured text are included in each Entity annotation. See the Data model reference for an example.
The value for
location.startindicates the index of the first character in the Entity annotation and it assumes start-index inclusion.
The value for
location.endindicates the index of the last character in the Entity annotation and assumes end-index exclusion.
You can also nest classification-type annotations within an Entity annotation. Nested classifications for text are supported for all classification types (see section above).
Updated 2 days ago