Import text data

How to import annotations on text data and sample import formats.

Specifications

File format: TXT
Text encoding: UTF-8 (Note: The Editor does not process any special character sequences like HTML Entities, Unicode Escape Sequence, or colon emoji aliases.)

Import methods:

  • IAM Delegated Access
  • Signed URLs (https URLs only)
  • Direct upload of local files(256 MB max file size)
    Note: Direct upload currently does not support adding additional metadata and attachments, see below Python example.

Parameters

ParameterRequiredDescription
row_dataYesYou can upload text in two ways:
1) A https url to cloud hosted text file. The txt file must be encoded as UTF-8.

2) A text string (up to 16MB)
global_keyNoUnique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if its global keys are duplicated to existing data rows.
media_typeNo"TEXT" (optional media type to provide better validation and error messaging)
metadata_fieldsNoSee Metadata.
attachmentsNoSee Attachments and Asset overlays.

Import format

[
  {
    "row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-1.txt",
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-1.txt",
    "media_type": "TEXT",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  },
  {
    "row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-2.txt",
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-2.txt",
    "media_type": "TEXT",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  }
]
[
  {
    "row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-1.txt",
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-1.txt",
    "media_type": "TEXT",
    "metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
    "attachments": [{"type": "HTML", "value": "https://www.wikipedia.org/" }]
  }
]
[
  {
    "row_data": "👋 I am a raw text string with emojis  🙏 😃",
    "global_key": "483e64b0-c2fb-41bc-9f69-82c036f1ca5c",
    "media_type": "TEXT",
    "metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  },
  {
    "row_data": "Active learning is a method of learning in which students are actively or experientially involved in the learning process and where there are different levels of active learning, depending on student involvement.[1] Bonwell & Eison (1991) states that students participate [in active learning] when they are doing something besides passively listening. According to Hanson and Moser (2003) using active teaching techniques in the classroom create better academic outcomes for students. Scheyvens, Griffin, Jocoy, Liu, & Bradford (2008) further noted that “by utilizing learning strategies that can include small-group work, role-play and simulations, data collection and analysis, active learning is purported to increase student interest and motivation and to build students ‘critical thinking, problem-solving and social skills”. In a report from the Association for the Study of Higher Education (ASHE), authors discuss a variety of methodologies for promoting active learning. They cite literature that indicates students must do more than just listen in order to learn. They must read, write, discuss, and be engaged in solving problems. This process relates to the three learning domains referred to as knowledge, skills and attitudes (KSA). This taxonomy of learning behaviors can be thought of as the goals of the learning process.[2] In particular, students must engage in such higher-order thinking tasks as analysis, synthesis, and evaluation.[3]",
    "global_key": "6a12ff36-8a3e-4e7d-aff1-261840500c96",
    "media_type": "TEXT",
    "metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  }
]

Python example

from labelbox import Client
from uuid import uuid4 ## to generate unique IDs
import datetime 

client = Client(api_key="<YOUR_API_KEY>")

dataset = client.create_dataset(name="Bulk import example - Text")

assets = [
  {
    "row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-1.txt",
    "global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/text-samples/sample-text-1.txt",
    "media_type": "TEXT",
    "metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
  }
]
task = dataset.create_data_rows(assets)
task.wait_till_done()
print(task.errors)
local_file_paths = ['path/to/local/file1', 'path/to/local/file1'] # limit: 15k files


new_dataset = client.create_dataset(name = "Local files upload")

try:
    task = new_dataset.create_data_rows(local_file_paths)
    task.wait_till_done()
except Exception as err:
    print(f'Error while creating labelbox dataset -  Error: {err}')