> ## Documentation Index
> Fetch the complete documentation index at: https://docs.labelbox.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Batch

> Developer guide for creating and modifying batches via the Python SDK.

<CardGroup cols={2}>
  <Card title="Open in Colab" icon="infinity" iconType="solid" horizontal href="https://colab.research.google.com/github/Labelbox/labelbox-notebooks/blob/main/basics/batches.ipynb" />

  <Card title="GitHub" icon="github" iconType="solid" horizontal href="https://github.com/Labelbox/labelbox-notebooks/blob/main/basics/batches.ipynb" />
</CardGroup>

## Client

<CodeGroup>
  ```python Python theme={null}
  import labelbox as lb
  client = lb.Client(api_key="<YOUR_API_KEY>")
  ```
</CodeGroup>

## Fundamentals

### Create a batch

Batch creation is a method of the `Project` class.

When creating a batch to send to a project, one of either `global_keys` or `data_rows` must be supplied as an argument. If using the `data_rows` argument, you can supply either a list of data row IDs or a list of `DataRow` class objects.

Optionally, you can supply a `priority` field to control the labeling priority of the batch. The `priority` field accepts 32-bit integer values, which means you can set the priority to any integer value between `-2,147,483,648` to `2,147,483,647`. (For practical purposes, we recommend against using large priority values.)

Setting the `priority` will determine the order in which the included data rows appear in the labeling queue compared to other batches. If no value is provided, the batch will assume the lowest priority.

Note: You can use the SDK to set the priority of individual data rows (see the Modify data row priority section in the [Project overview](/reference/project)).

```python theme={null}
project.create_batch(
  name="<unique_batch_name>",
  global_keys=["key1", "key2", "key3"],
  priority=5,
)

# if the project uses consensus, you can optionally supply a dictionary with consensus settings

# if provided, the batch will use consensus with the specificed coverage and votes

project.create_batch(
  name="<unique_batch_name>",
  data_rows=["<data_row_id>", "<data_row_id>"],
  priority=1,
  consensus_settings={"number_of_labels": 3, "coverage_percentage": 0.1}
)
```

#### Create multiple batches

The `project.create_batches()` method accepts up to 1 million data rows. Batches are chunked into groups of 100k data rows (if necessary), which is the maximum batch size.

This method takes in a list of either data row IDs or `DataRow`objects into a `data_rows` argument or global keys into a `global_keys` argument, but both approaches cannot be used in the same method. Batches will be created with the specified `name_prefix` argument and a unique suffix to ensure unique batch names. The suffix will be a 4-digit number starting at `0000`.

For example, if the name prefix is `demo-create-batches-` and three batches are created, the names will be `demo-create-batches-0000`, `demo-create-batches-0001`, and `demo-create-batches-0002`. This method will throw an error if a batch with the same name already exists.

```python theme={null}
task = project.create_batches(
  name_prefix="demo-create-batches-",
  global_keys=global_keys,
  priority=5
)

print("Errors: ", task.errors())
print("Result: ", task.result())
```

#### Create batches from a dataset

If you wish to create batches in a project using all the data rows of a dataset, instead of gathering global keys or IDs and iterating over subsets of data rows, you can use the `project.create_batches_from_dataset()` method.

This method takes in a dataset ID and creates a batch (or batches if there are more than 100k data rows) comprised of all data rows not already in the project. The same logic applies to the `name_prefix` argument and the naming of batches as described in the section immediately above.

```python theme={null}
dataset = client.get_dataset("<dataset_id>")

task = project.create_batches_from_dataset(
    name_prefix="demo-dataset-",
    dataset_id=dataset.uid,
    priority=5
)

print("Errors: ", task.errors())
print("Result: ", task.result())
```

### Get a batch

Batches are accessible via an object of the `Project` class.

```python theme={null}
# get a project
project = client.get_project("<project_id>")

# get the batches (returns a paginated collection of Batch objects)
batches = project.batches()

# get one batch
batch = next(batches)

# inspect all batches
for batch in batches:
  print(batch)
    
# for ease of use, you can convert the paginated collection to a list
list(batches)
```

You can also use the following without a project object.

```python theme={null}
project_id = "<project_id>"
batch_id = "<batch_id>"

batch = client.get_batch(project_id=project_id, batch_id=batch_id)
```

## Methods

### Export the data rows

You can export the details of the data rows of one or multiple batches. By filtering the result, you can obtain a list of global keys or data row IDs.

```python theme={null}
# Define the parameters for the export
export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "batch_ids": [batch.uid]  # Include batch ID(s)
}

filters = {}

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Stream the export using a callback function
def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
  print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list
export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

# (Optional) To export all the global keys 
global_keys = [data_row.json["data_row"]["global_key"] for data_row in stream]

# (Optional) To export all the data row ids
data_row_ids = [data_row.json["data_row"]["id"] for data_row in stream]
```

### Remove queued data rows

This method removes queued data rows from the batch and consequently the labeling queue of the project.

```python theme={null}
batch.remove_queued_data_rows()
```

### Delete the labels

This method deletes the labels made on data rows in the batch and re-queues the data rows for labeling.

```python theme={null}
batch.delete_labels()

# alternatively, you can re-queue the data with labels as templates

batch.delete_labels(set_labels_as_template=True)
```

### Delete a batch

If any labels created on data rows in the batch exist, the batch cannot be deleted. First, delete the labels made on data rows in the batch (as shown above), then you can delete the batch itself.

```python theme={null}
batch.delete()
```

## Attributes

### Get the basics

```python theme={null}
# name (str)
batch.name

# created at (datetime)

batch.created_at

# updated at (datetime)

batch.updated_at

# size, the number of data rows in the batch (int)

batch.size

# project (relationship to Project object)

project = batch.project()
```
