How to import HTML data and sample import formats.
Supported file formats and import methods
Format: HTML
Import methods:
- IAM Delegated Access
- Signed URLs (
https
URLs only)
Parameters
Import methods:
- IAM Delegated Access
- Signed URLs (
https
URLs only)
Parameter | Required | Description |
---|---|---|
row_data | Yes | https path to an HTML file.For IAM Delegated Access, this URL must be in virtual-hosted-style format. For older regions, your S3 bucket may be in the https://<bucket-name>.s3.<region>.amazonaws.com/<key> format. If your object URLs are formatted this way, make sure they are in the virtual-hosted-style format before importing. |
global_key | No | Unique user-generated file name or ID for the file. Global keys are enforced to be unique in your org. Data rows will not be imported if its global keys are duplicated to existing data rows. |
media_type | No | "HTML" (optional media type to provide better validation and error messaging) |
metadata_fields | No | See Metadata. |
attachments | No | See Attachments and Asset overlays, |
Import format
[
{
"row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/sample_html_1.html",
"global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/sample_html_1.html",
"metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
"attachments": [{"type": "HTML", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/windy.html" }]
},
{
"row_data": "https://lb-test-data.s3.us-west-1.amazonaws.com/sample_html_2.html",
"global_key": "https://lb-test-data.s3.us-west-1.amazonaws.com/sample_html_2.html",
"metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
[
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_1.html",
"global_key": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_1.html",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "HTML", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/windy.html" }]
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_2.html",
"global_key": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_2.html",
"metadata_fields": [{"schema_id": "cko8s9r5v0001h2dk9elqdidh", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
<html>
<head>
<title>HTML File Example</title>
</head>
<body bgcolor="ffffff">
<center><img src="https://labelbox.com/static/images/logo-v4.svg" align="bottom">
<hr>
<h1>Get to production AI faster</h1>
<p>Save time by creating and managing your training data, people, and processes in a single place — so you can focus on building the next big thing.</p>
<p><a href="https://labelbox.com/sales">Get a demo</a> or <a href="https://app.labelbox.com/">start for free.</a>
</center>
<hr>
</body>
</html>
Python example
from labelbox import Client
from uuid import uuid4 ## to generate unique IDs
import datetime
client = Client(api_key="<YOUR_API_KEY>")
dataset = client.create_dataset(name="Bulk import example - HTML")
assets = [
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_1.html",
"global_key": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_1.html",
"metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
"attachments": [{"type": "HTML", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/windy.html" }]
},
{
"row_data": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_2.html",
"global_key": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_2.html",
"metadata_fields": [{"name": "<metadata_field_name>", "value": "tag_string"}],
"attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
}
]
task = dataset.create_data_rows(assets)
task.wait_till_done()
print(task.errors)
local_file_paths = ['path/to/local/file1', 'path/to/local/file1'] # limit: 15k files
new_dataset = client.create_dataset(name = "Local files upload")
try:
task = new_dataset.create_data_rows(local_file_paths)
task.wait_till_done()
except Exception as err:
print(f'Error while creating labelbox dataset - Error: {err}')