Labelbox documentation

Make URLs private

Signed URLs contain authentication information in their query strings and are used to limit the permission and timeframe for each request. Importing your data via signed URLs is one method for protecting your data from unauthorized access. When you import your data this way, Labelbox has no access to your data files.

Amazon S3 has a maximum signature expiry of 7 days. In order to upload signed URLs to Labelbox that don’t expire, follow the instructions in Non-expiring signed URLs.

You can also choose to whitelist the IP range of a wifi network. In case you can’t give VPN access to an outsourced labeling firm, you can whitelist their network.

Signed URLs

To create an import file containing signed URLs, follow these steps:

  1. Generate signed URLs for each asset in your dataset. A signed URL has a unique key and should look like this: http://example.com/filename?hash=DMF1ucDxtqgxwYQ==.

    1. Google cloud

    2. Amazon AWS

    3. Microsoft Azure

  2. For added security, you can also specify a range of IP addresses that are allowed to access your data.

  3. Create a file containing the signed URLs.

    1. JSON (recommended)Create JSON file

    2. CSV

      Data_URLs,External_ID
      http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660804/isu8zqke6xoopnemuvvc.jpg,ID1
      http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660805/ldie4gmhaqfhw1df1wls.jpg,ID2
      http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660805/dvbb5kv3dudxhibqpuni.jpg,ID3
      http://res.cloudinary.com/ddpai9fpa/image/upload/v1516660806/inm7ipr8h9ecx1fzcspm.jpg,ID4
  4. Upload the import file to Labelbox. If you are importing a CSV file, you will need to specify which column contains the URL and which contains the optional external ID.

To create signed URLs with the Boto3 SDK, follow these steps;

  1. Before using Boto3, set up authentication credentials in the IAM console. You can create or use an existing user. Go to manage access keys and generate a new set of keys.

  2. If you have AWS CLI installed, run aws configure to configure your credentials file. Alternatively, you can create the credential file yourself. By default, its location is at ~/.aws/credentials: [default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY

  3. Run pip install boto3.

  4. Use this Python script to get all objects in a selected bucket and generate signed URLs for each object.

    import boto3
    
    s3 = boto3.resource('s3')
    s3_client = boto3.client('s3')
    
    #Your Bucket Name
    bucket = s3.Bucket('YOUR_BUCKET_NAME')
    
    #Gets the list of objects in the Bucket
    s3_Bucket_iterator = bucket.objects.all()
    
    #Generates the Signed URL for each object in the Bucket 
    for i in s3_Bucket_iterator:
        url = s3_client.generate_presigned_url(ClientMethod='get_object',Params={'Bucket':bucket.name,'Key':i.key})
        print(url)

Note

For more information and a detailed guide on Boto, see the Boto3 installation guide. For more information on Signed URLs in Amazon S3, see the AWS developer guide.

Non-expiring signed URLs

Amazon S3 has a maximum signature expiry of 7 days. In order to upload signed URLs to Labelbox that don’t expire, we recommend proxying URLs through an endpoint on your server.

Follow the steps below to deploy a proxy endpoint that accepts a signed URL and returns a new signed s3 URL to an asset. We provided a one-click deploy through Heroku, however, you could also build this simple handler into your existing web service.

Check out the example proxy we made, https://github.com/Labelbox/signed-url-example. You can deploy it with one click here:https://heroku.com/deploy?template=https://github.com/Labelbox/signed-url-example

  1. Get the IAM information to be able to create pre-signed URLs - AWS_ACCESS_KEY_ID - AWS_SECRET_ACCESS_KEY - a bucket name.

  2. Make sure this IAM user can LIST and GET files in the bucket.

  3. For each asset in the s3 bucket, generate a signed URL with our JWT secret that points to our server endpoint.

  4. From Heroku, get the host URL of your new app by "open app". Then, get the generated secret (settings > reveal config vars).

    git clone https://github.com/Labelbox/generate-tokenized-urls
    cd generate-tokenized-urls/
    
    // confirm you have node.js installed
    node --version
    
    npm install
    node cli.js 
      --bucket <your-aws-bucket-name>
      --host https://<your-new-heroku-url>.herokuapp.com/
      --secret <heroku-generated-config-secret>
      --output labelbox-import.json
  5. Upload labelbox-import.json to Labelbox.

Whitelist IP range for AWS bucket

Another way to protect your data from unauthorized access is to whitelist an IP range. For example, you can put your source data behind a local network or VPN. Then, whitelist the IP range of your VPN so you can access and review data from anywhere and better manage access to your network. Or you can whitelist the IP range of your outsourced labeling team's wifi network. This way external team members can label and review the data without VPN access.

To whitelist your IP range, follow these steps:

  1. Find your IP range. Typically a router will be configured for 255 IP addresses. Visit whatsmyip.org to see your computer’s IP address. For example, if it was 192.168.1.68 then your IP range would be 192.168.1.0 - 192.168.1.255 If you’re under a company VPN you should contact an administrator to get a static IP range.

  2. Add an IP address bucket policy in AWS. If set up correctly, you should be able to load the assets in the Editor while connected to your VPN or wifi network. Once you disconnect from your VPN or wifi network, the assets should no longer load.

    {
      "Version": "2012-10-17",
      "Id": "S3PolicyId1",
      "Statement": [
        {
          "Sid": "IPAllow",
          "Effect": "Allow",
          "Principal": "*",
          "Action": [ "s3:GetObject" ],
          "Resource": "arn:aws:s3:::examplebucket/*",
          "Condition": {
             "IpAddress": {"aws:SourceIp": "54.240.143.0/24"},
          } 
        } 
      ]
    }