Import prompt and response data

Import prompt and responses via Model-assisted labeling or Ground truth import.

Open this Colab for an interactive tutorial on importing prompt and response data for the LLM data generation editor.

Supported annotations

Prompt

Classification: Free-form text

prompt_annotation_ndjson = {
  "name": "Follow the prompt and select answers",
  "answer": "This is an example of a prompt"
}
Python annotations not yet supported

Responses

Classification: Radio

response_radio_annotation_ndjson= {
  "name": "response_radio",
  "answer": {
      "name": "response_a"
    }
}
Python annotations not yet supported

Classification: Free-form text

response_text_annotation_ndjson = {
  "name": "Provide a reason for your choice",
  "answer": "This is an example of a response text"
}
Python annotations not yet supported

Classification: Checklist

response_checklist_annotation_ndjson = {
  "name": "response_checklist",
  "answer": [
    {
      "name": "response_a"
    },
    {
      "name": "response_c"
    }
  ]
}
Python annotations not yet supported

End-to-end example: Import prompt & responses

πŸ“˜

No SDK support

Creating a project and creating an ontology for LLM data generation is not yet supported through the SDK. Follow the steps below to create a project and ontology via the UI.

Before you start

  1. Go to Annotate and select New project.
  2. Select LLM data generation and then select Humans generate prompts and responses.
  3. Name your project, select Create a new dataset, and name your dataset (data rows will be generated automatically in this step).

Step 1: Specify project ID and global keys

# Enter the project id
project_id = ""

# Select one of the global keys from the data rows generated
global_key = ""

Step 2: Create/select an ontology in the UI

  1. In your project, navigate to Settings > Label editor.
  2. Select Edit.
  3. Select a new ontology and add the features used in this demo.
// For this demo, the following ontology was generated in the UI.
ontology_json = """
{
 "tools": [],
 "relationships": [],
 "classifications": [
  {
   "schemaNodeId": "clpvq9d0002yt07zy0khq42rp",
   "featureSchemaId": "clpvq9d0002ys07zyf2eo9p14",
   "type": "prompt",
   "name": "Follow the prompt and select answers",
   "archived": false,
   "required": true,
   "options": [],
   "instructions": "Follow the prompt and select answers",
   "minCharacters": 5,
   "maxCharacters": 100
  },
  {
   "schemaNodeId": "clpvq9d0002yz07zy0fjg28z7",
   "featureSchemaId": "clpvq9d0002yu07zy28ik5w3i",
   "type": "response-radio",
   "name": "response_radio",
   "instructions": "response_radio",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [
    {
     "schemaNodeId": "clpvq9d0002yw07zyci2q5adq",
     "featureSchemaId": "clpvq9d0002yv07zyevmz1yoj",
     "value": "response_a",
     "label": "response_a",
     "position": 0,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0002yy07zy8pe48zdj",
     "featureSchemaId": "clpvq9d0002yx07zy0jvmdxk8",
     "value": "response_b",
     "label": "response_b",
     "position": 1,
     "options": []
    }
   ]
  },
  {
   "schemaNodeId": "clpvq9d0002z107zygf8l62ys",
   "featureSchemaId": "clpvq9d0002z007zyg26115f9",
   "type": "response-text",
   "name": "provide_a_reason_for_your_choice",
   "instructions": "Provide a reason for your choice",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [],
   "minCharacters": 5,
   "maxCharacters": 100
  },
  {
   "schemaNodeId": "clpvq9d0102z907zy8b10hjcj",
   "featureSchemaId": "clpvq9d0002z207zy6xla7f82",
   "type": "response-checklist",
   "name": "response_checklist",
   "instructions": "response_checklist",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [
    {
     "schemaNodeId": "clpvq9d0102z407zy0adq0rfr",
     "featureSchemaId": "clpvq9d0002z307zy6dqb8xsw",
     "value": "response_a",
     "label": "response_a",
     "position": 0,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0102z607zych8b2z5d",
     "featureSchemaId": "clpvq9d0102z507zyfwfgacrn",
     "value": "response_c",
     "label": "response_c",
     "position": 1,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0102z807zy03y7gysp",
     "featureSchemaId": "clpvq9d0102z707zyh61y5o3u",
     "value": "response_d",
     "label": "response_d",
     "position": 2,
     "options": []
    }
   ]
  }
 ],
 "realTime": false
}

"""

Step 3: Create the annotations payload

label_ndjson = []
for annotations in [
    prompt_annotation_ndjson,
    response_radio_annotation_ndjson,
    response_text_annotation_ndjson,
    response_checklist_annotation_ndjson
    ]:
  annotations.update({
      "dataRow": {
          "globalKey": global_key
      }
  })
  label_ndjson.append(annotations)
Python annotations not yet supported

Step 4: Upload prompt and responses as pre-labels or complete labels

project = client.get_project(project_id=project_id)

Import as pre-labels via Model-assisted labeling

upload_job = lb.MALPredictionImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name=f"mal_job-{str(uuid.uuid4())}",
    predictions=label_ndjson)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

Import as Ground truth

upload_job = lb.LabelImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name="label_import_job"+str(uuid.uuid4()),
    labels=label_ndjson)

upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)