Build datasets from logs

Production logs contain the cases your agent actually sees: real inputs, real model responses, tool context, evaluator results, cost, latency, and metadata. Add useful spans to Datasets when you want future prompt versions to remember those cases. This is the bridge from observability to regression coverage: a real span becomes a row, the row gets evaluators, and future prompt versions are checked against the same evidence.

Trace side sheet with selected model span and Add to Dataset action

Find the right evidence

Start from the signal:

Signal	Where to look
A chart moved	Open Monitor charts, then drill into traces for the time window.
A customer report arrived	Filter Traces by timestamp, name, reference ID, or safe metadata.
A Behavior repeated	Open Behaviors, then inspect representative trace evidence.
An evaluator failed	Filter by evaluator result and inspect the model span.
An Improve cycle needs coverage	Add representative spans before or after review so the case is durable.

Do not add every trace. Good datasets are curated: each row has a reason to exist and a clear expectation.

Add a span to a dataset

Find the trace

Use Traces, filters, or Deep search to find representative production evidence.

Select the model span

Open the trace and select the model span that contains the useful input, variables, response, and evaluator results.

Choose Add to Dataset

Use Add to Dataset from the span details panel. Add the span to an existing compatible dataset, or use an empty dataset that Adaline can bootstrap.

Review the new row

Confirm variable values, response content, labels, and metadata before using the row as regression coverage.

Attach evaluators

Add or update evaluators so the row can pass or fail future prompt versions.

Dataset compatibility

Adaline can add selected model spans to datasets when the dataset can accept the span variables. A dataset is valid for selected spans when:

It belongs to the same project.
It is empty, with no rows and no columns, so Adaline can bootstrap columns.
Or it already contains columns for all variables present in the selected model span.

When Adaline bootstraps an empty dataset, it can create columns from span variables and include a response column.

What gets copied

For model spans, Adaline can copy:

Prompt variable values.
The model response, when a response column exists or the dataset is being bootstrapped.
Text values.
Image and PDF references when represented as URLs, hosted paths, or data payloads.
Complex values serialized as JSON when needed.

The resulting row keeps the dataset source as trace-derived evidence. Review the row after adding it; production data often needs cleanup before it becomes a release gate.

Make the row useful

After adding a row:

Remove sensitive or unnecessary production content.
Add labels or notes that explain why the row matters.
Add expected output or pass criteria when needed.
Attach evaluators that can score the case.
Keep regression datasets focused by behavior, workflow, prompt, or release risk.

Use the dataset in the loop

Trace-derived rows are strongest when they become part of a repeatable release check:

Monitor, Traces, or Behaviors surface an issue.
Representative spans become dataset rows.
Evaluators define what good looks like.
Improve proposes candidate prompt changes.
Reviewers approve only when the candidate handles the production case without breaking coverage.

Analyze log spans

Inspect model, tool, and orchestration spans before choosing what to preserve.

Datasets overview

Organize rows used for evaluation and regression coverage.

Evaluators overview

Score the cases you preserve from production.

Improve overview

Use coverage and Behavior evidence in improvement cycles.

​Find the right evidence

​Add a span to a dataset

​Dataset compatibility

​What gets copied

​Make the row useful

​Use the dataset in the loop