Make URLs private
Signed URLs contain authentication information in their query strings and are used to limit the permission and timeframe for each request. Importing your data via signed URLs is one method for protecting your data from unauthorized access. When you import your data this way, Labelbox has no access to your data files.
Amazon S3 has a maximum signature expiry of 7 days. In order to upload signed URLs to Labelbox that don’t expire, follow the instructions in Non-expiring signed URLs.
You can also choose to whitelist the IP range of a wifi network. In case you can’t give VPN access to an outsourced labeling firm, you can whitelist their network.
Signed URLs
To create an import file containing signed URLs, follow these steps:
Generate signed URLs for each asset in your dataset. A signed URL has a unique key and should look like this:
http://example.com/filename?hash=DMF1ucDxtqgxwYQ==
.For added security, you can also specify a range of IP addresses that are allowed to access your data.
Create a file containing the signed URLs.
CSV
Data_URLs,External_ID http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660804/isu8zqke6xoopnemuvvc.jpg,ID1 http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660805/ldie4gmhaqfhw1df1wls.jpg,ID2 http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660805/dvbb5kv3dudxhibqpuni.jpg,ID3 http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660806/inm7ipr8h9ecx1fzcspm.jpg,ID4
Upload the import file to Labelbox. If you are importing a CSV file, you will need to specify which column contains the URL and which contains the optional external ID.
To create signed URLs with the Boto3 SDK, follow these steps;
Before using Boto3, set up authentication credentials in the IAM console. You can create or use an existing user. Go to manage access keys and generate a new set of keys.
If you have AWS CLI installed, run
aws configure
to configure your credentials file. Alternatively, you can create the credential file yourself. By default, its location is at ~/.aws/credentials: [default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEYRun
pip install boto3
.Use this Python script to get all objects in a selected bucket and generate signed URLs for each object.
import boto3 s3 = boto3.resource('s3') s3_client = boto3.client('s3') #Your Bucket Name bucket = s3.Bucket('YOUR_BUCKET_NAME') #Gets the list of objects in the Bucket s3_Bucket_iterator = bucket.objects.all() #Generates the Signed URL for each object in the Bucket for i in s3_Bucket_iterator: url = s3_client.generate_presigned_url(ClientMethod='get_object',Params={'Bucket':bucket.name,'Key':i.key}) print(url)
Note
For more information and a detailed guide on Boto, see the Boto3 installation guide. For more information on Signed URLs in Amazon S3, see the AWS developer guide.
Non-expiring signed URLs
Amazon S3 has a maximum signature expiry of 7 days. In order to upload signed URLs to Labelbox that don’t expire, we recommend proxying URLs through an endpoint on your server.
Follow the steps below to deploy a proxy endpoint that accepts a signed URL and returns a new signed s3 URL to an asset. We provided a one-click deploy through Heroku, however, you could also build this simple handler into your existing web service.
Check out the example proxy we made, https://github.com/Labelbox/signed-url-example. You can deploy it with one click here:https://heroku.com/deploy?template=https://github.com/Labelbox/signed-url-example
Get the IAM information to be able to create pre-signed URLs - AWS_ACCESS_KEY_ID - AWS_SECRET_ACCESS_KEY - a bucket name.
Make sure this IAM user can LIST and GET files in the bucket.
For each asset in the s3 bucket, generate a signed URL with our JWT secret that points to our server endpoint.
From Heroku, get the host URL of your new app by "open app". Then, get the generated secret (settings > reveal config vars).
git clone https://github.com/Labelbox/generate-tokenized-urls cd generate-tokenized-urls/ // confirm you have node.js installed node --version npm install node cli.js --bucket <your-aws-bucket-name> --host https://<your-new-heroku-url>.herokuapp.com/ --secret <heroku-generated-config-secret> --output labelbox-import.json
Upload
labelbox-import.json
to Labelbox.
Whitelist IP range for AWS bucket
Another way to protect your data from unauthorized access is to whitelist an IP range. For example, you can put your source data behind a local network or VPN. Then, whitelist the IP range of your VPN so you can access and review data from anywhere and better manage access to your network. Or you can whitelist the IP range of your outsourced labeling team's wifi network. This way external team members can label and review the data without VPN access.
To whitelist your IP range, follow these steps:
Find your IP range. Typically a router will be configured for 255 IP addresses. Visit whatsmyip.org to see your computer’s IP address. For example, if it was 192.168.1.68 then your IP range would be 192.168.1.0 - 192.168.1.255 If you’re under a company VPN you should contact an administrator to get a static IP range.
Add an IP address bucket policy in AWS. If set up correctly, you should be able to load the assets in the Editor while connected to your VPN or wifi network. Once you disconnect from your VPN or wifi network, the assets should no longer load.
{ "Version": "2012-10-17", "Id": "S3PolicyId1", "Statement": [ { "Sid": "IPAllow", "Effect": "Allow", "Principal": "*", "Action": [ "s3:GetObject" ], "Resource": "arn:aws:s3:::examplebucket/*", "Condition": { "IpAddress": {"aws:SourceIp": "54.240.143.0/24"}, } } ] }