Skip to main content
Annotations turn copied production evidence into useful evaluation data. After a span becomes a dataset row, add the labels, expected behavior, notes, or review status your team needs to trust it.

Where annotation happens

Adaline keeps raw evidence in Traces, but review work belongs in Datasets. Use logs to find the case, then use dataset columns to capture what the team learned. Typical annotation fields include:
FieldPurpose
Expected outputThe response or behavior future prompt versions should produce.
Issue labelThe failure mode, such as wrong tool, missing context, unsafe answer, or bad format.
Review noteHuman explanation of what happened and why the row matters.
PriorityWhether the case should block release or remain advisory.
StatusPending, reviewed, fixed, or accepted risk, based on your team’s workflow.
Column names are flexible. Keep them consistent across datasets so evaluators and reviewers do not have to relearn each table.

Build a review queue

Use dataset filters to create a small queue, not a giant backlog:
  1. Add only representative spans from Monitor, Traces, Behaviors, or Improve review.
  2. Filter dataset rows where review fields are empty.
  3. Annotate the rows that will affect release decisions.
  4. Attach evaluators that use the expected output, label, or rubric.
  5. Move reviewed rows into regression coverage when they should gate future prompt versions.

Keep annotations practical

Write enough for another teammate to understand the case, but do not turn every row into a long incident report. The best annotations explain what the model should do differently and why the row belongs in the dataset.

Build datasets from logs

Add useful production spans to datasets.

Set up a dataset

Structure dataset columns for review and evaluation.

Evaluators overview

Turn annotated expectations into checks.

Build regression coverage

Use production cases as release safety.