Where annotation happens
Adaline keeps raw evidence in Traces, but review work belongs in Datasets. Use logs to find the case, then use dataset columns to capture what the team learned. Typical annotation fields include:| Field | Purpose |
|---|---|
| Expected output | The response or behavior future prompt versions should produce. |
| Issue label | The failure mode, such as wrong tool, missing context, unsafe answer, or bad format. |
| Review note | Human explanation of what happened and why the row matters. |
| Priority | Whether the case should block release or remain advisory. |
| Status | Pending, reviewed, fixed, or accepted risk, based on your team’s workflow. |
Build a review queue
Use dataset filters to create a small queue, not a giant backlog:- Add only representative spans from Monitor, Traces, Behaviors, or Improve review.
- Filter dataset rows where review fields are empty.
- Annotate the rows that will affect release decisions.
- Attach evaluators that use the expected output, label, or rubric.
- Move reviewed rows into regression coverage when they should gate future prompt versions.
Keep annotations practical
Write enough for another teammate to understand the case, but do not turn every row into a long incident report. The best annotations explain what the model should do differently and why the row belongs in the dataset.Build datasets from logs
Add useful production spans to datasets.
Set up a dataset
Structure dataset columns for review and evaluation.
Evaluators overview
Turn annotated expectations into checks.
Build regression coverage
Use production cases as release safety.