See the IAM delegated access Integrations Guide page for instructions on setting up Cloud storage integrations. Using IAM delegated access integrations is the recommended option for all cloud users.
List your organization's IAM integrations
import labelbox as lb
client = lb.Client("<YOUR_API_KEY>")
organization = client.get_organization()
iam_integrations = organization.get_iam_integrations()
for integration in iam_integrations:
print(integration)
Get the default IAM integration
default_integration = organization.get_default_iam_integration()
Set IAM integration during dataset creation
Each dataset can have only one IAM integration. Use the iam_integration
optional field for client.create_dataset
.
If not set, it will use the default integration of your organization.
You can then upload data rows using the cloud storage URLs.
iam_integration = organization.get_iam_integrations()[1]
dataset = client.create_dataset(name="IAM manual demo", iam_integration=iam_integration)
Override default integration during dataset creation
You can override the default integration when creating a dataset.
dataset = client.create_dataset(name="IAM manual demo", iam_integration=None)
Update dataset integration
You can change the current integration of a dataset with add_iam_integration
.
# Get all IAM integrations
iam_integrations = client.get_organization().get_iam_integrations()
# Get IAM integration id
iam_integration_id = [integration.uid for integration
in iam_integrations
if integration.name == "My S3 integration"][0]
# Set IAM integration for integration id
dataset.add_iam_integration(iam_integration_id)
# Get IAM integration object
iam_integration = [integration.uid for integration
in iam_integrations
if integration.name == "My S3 integration"][0]
# Set IAM integration for IAMIntegrtion object
dataset.add_iam_integration(iam_integration)
Remove/Unselect dataset integration
dataset.remove_iam_integration()
Upload data rows with delegated access
Make sure the type of IAM integration is matching your data rows' cloud storage, then use the URL for the row_data
field to upload data rows.
# Some examples:
datarows = [{"row_data": "https://<bucket-name>.s3.<region>.amazonaws.com/<key>"}] # Amazon S3
datarows = [{"row_data": "gs://gcs-lb-demo-bucket/test.png"}] # Google Cloud Storage
datarows = [{"row_data": "https://labelboxdatasets.blob.core.windows.net/datasets/geospatial/001.jpg"}] # Microsoft Azure Blob Storage
task1 = dataset.create_data_rows(datarows)
task1.wait_till_done()