Skip to main content
When a production span reveals a useful case, add it to a dataset so the team can test future prompt versions against the same evidence. This is the bridge from observability to regression coverage: a real span becomes a row, the row gets evaluators, and future deployments must keep passing it.

When to add a span

Add a span to a dataset when:
  • The model answered incorrectly.
  • The model handled a difficult case well and you want a golden example.
  • A user request should become a regression case.
  • A tool call, retrieval result, or missing context changed the answer.
  • An evaluator failure needs a durable example.
  • A Behavior investigation needs representative cases.
  • An Improve cycle should preserve the evidence after review.
Do not add every trace. Curated rows are easier to trust than a noisy dump.

Dataset compatibility

Adaline can add selected model spans to datasets when the dataset can accept the span variables. A dataset is valid for selected spans when:
  • It belongs to the same project.
  • It is empty, with no rows and no columns, so Adaline can bootstrap columns.
  • Or it already contains columns for all variables present in the selected model span.
When Adaline bootstraps an empty dataset, it can create columns from span variables and include a response column.

What gets copied

For model spans, Adaline can copy:
  • Prompt variable values.
  • The model response, when a response column exists or the dataset is being bootstrapped.
  • Text values.
  • Image and PDF references when represented as URLs, hosted paths, or data payloads.
  • Complex values serialized as JSON when needed.
The resulting row keeps the dataset source as trace-derived evidence. Review the row after adding it; production data often needs cleanup before it becomes a release gate.

Add spans to a dataset

1

Find the span

Use Traces, filters, or Deep Search to find representative production evidence.
2

Open the trace

Inspect the trace and select the model span that contains the useful input and response.
3

Choose the dataset action

Add the selected span to an existing compatible dataset, or use an empty dataset that Adaline can bootstrap.
4

Review the new row

Confirm variable values, response content, labels, and metadata.
5

Attach evaluators

Add or update evaluators so the row can pass or fail future prompt versions.

Clean up after copying

After adding production spans:
  • Remove secrets, private identifiers, or unnecessary customer data.
  • Add labels that explain why the row matters.
  • Add expected output or pass criteria.
  • Link the row to a Behavior or incident when helpful.
  • Move noisy exploratory rows out of release-gate datasets.

Use rows in the improvement loop

Trace-derived rows make Improve reviews safer:
  1. A production issue appears in Monitor, Traces, or Behaviors.
  2. Representative spans become dataset rows.
  3. Evaluators define the expected behavior.
  4. Improve proposes candidate prompt changes.
  5. Reviewers approve only when the candidate fixes the issue without regressing the dataset.

Avoid common mistakes

MistakeBetter approach
Adding a full noisy trace when one span mattersAdd the representative model span and keep context concise.
Adding rows without expected behaviorAdd labels, expected output, or evaluators before relying on the row.
Mixing every incident in one datasetSplit by behavior, workflow, or release risk.
Keeping sensitive production dataSanitize the row or do not store it.
Treating copied rows as automatically correctReview and curate each row before using it as a gate.
The strongest regression datasets are small, named, and traceable: each row has a reason to exist and a clear rule that future prompts must satisfy.