Documentation Index
Fetch the complete documentation index at: https://docs.labelbox.com/llms.txt
Use this file to discover all available pages before exploring further.
Open In Colab
GitHub
Information
This guide will assume you have a basic understanding of Python data structures and interacting with Labelbox exports.Before you start
The below imports are needed to use the code examples in this section.API_KEY with a valid API key to connect to the Labelbox client.
Create or select example project
The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our quickstart guide.Create Project
Select project
CSV format overview
To convert our Labelbox JSON data to a format more CSV-friendly, we must first define the needed structure of our JSON. A common format that is versatile for both the built-in Python CSV writer and Pandas is as follows:Labelbox JSON format
Labelbox JSON format is centralized at the individual data row of your export. This format allows expandability when things evolve and provides a centralized view of fields such as metadata or data row details. The main labels are located inside the project key and can be nested, making it difficult to parse. For complete samples of our project export format, visit our export overview page. To get Labelbox export JSON format to our CSV format, we established we must do the following:- Establish our base data row columns (project_id, data_row_id, global_key etc)
- Create our columns for label fields (label detail and annotations we care about)
- Define our functions and strategy used to parse through our data
- Setting up our main data row handler function
- Export our data
- Convert to our desired format
Step 1: Establish our base columns
We first establish our base columns that represent individual data row details. Typically, this columnβs information can be received from within one or two levels of a Labelbox export per data row. Please modify the below columns if you want to include more. You must update the code later in this guide to pick up any additional columns.Step 2: Create our columns for label fields
In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:Step 3: Define our functions and strategy used to parse through our data
Now that our columns are defined, we must develop a strategy for navigating our export data. Review this sample export to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the valueNone for anything absent, resulting in a blank cell in our CSV.
Data row detail base columns
The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row, the dictionary row used to make our list, and our base columns list.Label detail base columns
The label details are similar to data row details but exist at our exportβs label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row we will modify, and the label detail column list we created.Label annotation columns
The label annotations are the final columns we will need to obtain. Obtaining these fields is more challenging than our approach for our detail columns. Suppose we attempt to obtain the fields with conditional statements and hard-defined paths. In that case, we will run into issues as each label can have annotations in different orders, at different depths, or not present. This will quickly create a mess, especially when we want our methods to work for multiple ontology. The best and cleanest way of obtaining these annotations inside our export data is through a recursive function.Recursion
A recursive function can be defined as a routine that calls itself directly or indirectly. They solve problems by solving smaller instances of the same problem. This technique is commonly used in programming to solve problems that can be broken down into simpler, similar subproblems. Our sub-problem, in this case, is obtaining each individual annotation. A recursive function is divided into two components:- Base case: This is our termination condition that prevents the function from calling itself indefinitely.
- Recursive case: The function calls itself with the modified arguments in the recursive case. The recursive case should move closer to the base case with each iteration.
None). Our recursive case would be finding more classifications to parse.
In the code block below, I will highlight a few important details about our function. Essentially, we will be navigating through our JSON file by moving one classification key at a time until we find our annotation or, if everything has been searched, returning None, which will populate a blank cell on our CSV table.
Tools
Tools are not nested, but they can have nested classifications we will use orget_feature_answers function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on.
Step 4: Setting up our main data row handler function
Before exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row.Step 5: Export our data
We are ready to export now that we have defined functions and strategies. Below, we export directly from our project and feed in the main function we created above.GLOBAL_CSV_LIST printed out below with all your βrowsβ filled out.