Create and import datasets

Datasets store structured examples for prompt evaluation, regression coverage, and Improve evidence. A dataset row represents one case. Columns provide prompt variables, expected outputs, labels, evaluator inputs, or metadata.

Ways to create rows

Adaline tracks how dataset rows are created. Rows can come from:

Source	Use it when
Manual row	A reviewer knows the exact case to test.
Copy from playground	A prompt author finds a useful input or failure while iterating.
Copy log span	A production trace or span should become a reusable example.
Import CSV	A team already has examples in a spreadsheet or exported test set.
Generate synthetic	You want to broaden coverage from known variables or failure patterns.

Use the source as review context. A production-derived row usually deserves more attention than a synthetic exploratory row.

Create a dataset

Open Datasets

Open a project and select Datasets from the project navigation.

Create or import

Create a blank dataset for manual work, or import a CSV when you already have examples.

Define columns

Add columns for prompt variables, expected outputs, evaluator inputs, and metadata.

Add rows

Add rows manually, copy from playground, copy from traces, import CSV data, or generate synthetic cases.

Attach to evaluators

Link the dataset to evaluators in the prompt evaluation workflow.

Import CSV data

CSV import is useful for existing regression cases, QA test sets, prompt migration work, and customer-provided examples. Before importing:

Use a header row with stable column names.
Match prompt variable names when possible.
Keep expected outputs in separate columns.
Avoid mixing unrelated workflows in one CSV.
Remove secrets, raw customer identifiers, and data that your policy does not allow in Adaline.
Normalize booleans, enums, and labels before import.

After import, run a small evaluation first. This catches column mapping issues before you rely on the imported dataset.

Create rows from production traces

Use trace-derived rows when production exposes a real issue. A trace row is especially valuable when it includes:

The original user request.
The prompt or span that produced the output.
Relevant tool or retrieval context.
The assistant response.
Metadata such as route, release, tenant segment, or environment.
A label that explains why the example matters.

Trace-derived rows are the backbone of regression coverage. They make sure production failures are tested before the next deployment.

Generate synthetic rows

Synthetic rows can broaden coverage, but they are not a substitute for production evidence. Use synthetic rows to:

Cover edge cases that are rare in production.
Expand a known failure pattern.
Test combinations of variables.
Stress a schema or format requirement.

Review generated rows before treating them as release blockers. Archive rows that are unrealistic, duplicate, or impossible for your product.

Dataset sizing

There is no single correct dataset size. Use the smallest dataset that gives useful confidence.

Dataset type	Practical size guidance
Golden set	Small, high-confidence, reviewed by the team.
Regression set	Grows as real failures are discovered. Keep it curated.
Exploratory set	Larger and noisier, useful for discovery.
Synthetic set	Review and prune before promoting rows to a gate.

Large datasets can hide failures inside aggregate scores. Keep critical release gates readable.

Next steps

Manage columns and rows

Design static, API, and prompt-backed columns.

Regression coverage

Turn incidents and behaviors into durable tests.

Create evaluators

Attach criteria to dataset rows.

Traces

Find production examples worth saving.

​Ways to create rows

​Create a dataset

​Import CSV data

​Create rows from production traces

​Generate synthetic rows

​Dataset sizing

​Next steps