Skip to main content
Datasets store structured examples for prompt evaluation, regression coverage, and Improve evidence. A dataset row represents one case. Columns provide prompt variables, expected outputs, labels, evaluator inputs, or metadata.

Ways to create rows

Adaline tracks how dataset rows are created. Rows can come from:
SourceUse it when
Manual rowA reviewer knows the exact case to test.
Copy from playgroundA prompt author finds a useful input or failure while iterating.
Copy log spanA production trace or span should become a reusable example.
Import CSVA team already has examples in a spreadsheet or exported test set.
Generate syntheticYou want to broaden coverage from known variables or failure patterns.
Use the source as review context. A production-derived row usually deserves more attention than a synthetic exploratory row.

Create a dataset

1

Open Datasets

Open a project and select Datasets from the project navigation.
2

Create or import

Create a blank dataset for manual work, or import a CSV when you already have examples.
3

Define columns

Add columns for prompt variables, expected outputs, evaluator inputs, and metadata.
4

Add rows

Add rows manually, copy from playground, copy from traces, import CSV data, or generate synthetic cases.
5

Attach to evaluators

Link the dataset to evaluators in the prompt evaluation workflow.

Import CSV data

CSV import is useful for existing regression cases, QA test sets, prompt migration work, and customer-provided examples. Before importing:
  • Use a header row with stable column names.
  • Match prompt variable names when possible.
  • Keep expected outputs in separate columns.
  • Avoid mixing unrelated workflows in one CSV.
  • Remove secrets, raw customer identifiers, and data that your policy does not allow in Adaline.
  • Normalize booleans, enums, and labels before import.
After import, run a small evaluation first. This catches column mapping issues before you rely on the imported dataset.

Create rows from production traces

Use trace-derived rows when production exposes a real issue. A trace row is especially valuable when it includes:
  • The original user request.
  • The prompt or span that produced the output.
  • Relevant tool or retrieval context.
  • The assistant response.
  • Metadata such as route, release, tenant segment, or environment.
  • A label that explains why the example matters.
Trace-derived rows are the backbone of regression coverage. They make sure production failures are tested before the next deployment.

Generate synthetic rows

Synthetic rows can broaden coverage, but they are not a substitute for production evidence. Use synthetic rows to:
  • Cover edge cases that are rare in production.
  • Expand a known failure pattern.
  • Test combinations of variables.
  • Stress a schema or format requirement.
Review generated rows before treating them as release blockers. Archive rows that are unrealistic, duplicate, or impossible for your product.

Dataset sizing

There is no single correct dataset size. Use the smallest dataset that gives useful confidence.
Dataset typePractical size guidance
Golden setSmall, high-confidence, reviewed by the team.
Regression setGrows as real failures are discovered. Keep it curated.
Exploratory setLarger and noisier, useful for discovery.
Synthetic setReview and prune before promoting rows to a gate.
Large datasets can hide failures inside aggregate scores. Keep critical release gates readable.

Next steps

Manage columns and rows

Design static, API, and prompt-backed columns.

Regression coverage

Turn incidents and behaviors into durable tests.

Create evaluators

Attach criteria to dataset rows.

Traces

Find production examples worth saving.