Start from Behavior detail

Open Sessions

| Column | How to use it |
|---|---|
| Summarized at | Check whether the session summary is fresh enough for the investigation. |
| Intent | Understand the task the agent was trying to complete. |
| Outcome | Compare what the agent claims happened with the source spans after opening the session. |
| Traces / Spans | Estimate how much evidence exists before you inspect the journey. |
Read the journey summary

- Did the agent recover after the first failure?
- Did it run a meaningful verification step before finishing?
- Did it keep searching or editing without converging?
- Did the final summary match the inspected evidence?
- Is the pattern repeated across several tasks, or is it one unusual run?
Read a trajectory as phases

| Phase | What to inspect |
|---|---|
| Planning | Did the agent understand the task, constraints, and likely fix path? |
| Search / inspection | Did it look in the right files, tools, documents, or traces? |
| Action | What edits, tool calls, retrieval steps, or backend operations did it perform? |
| Verification | Did it run tests, checks, tool validation, or another meaningful confirmation? |
| Recovery | Did it adapt after failures, or repeat the same bad step? |
| Handoff | Did the final answer accurately describe what happened and what remains? |
Move from story to source evidence
Trajectories summarize the journey; source spans prove the details. After reading a trajectory, open the linked spans for the exact model call, tool call, command, retrieval step, or backend operation that matters. This keeps the review grounded. A Behavior may look like a prompt problem from the title, but the trajectory might show a broken tool, missing context, runtime setup issue, bad retrieval result, or incomplete logging.When trajectory evidence is weak
If a trajectory feels vague or incomplete, improve the logging before relying on it for a release decision. Common gaps include:- Missing stable task, session, or run identifiers for multi-step work.
- Generic trace or span names.
- Missing outcome/status.
- Missing agent identity or workflow metadata.
- Missing spans for tool calls, commands, retrieval, edits, or verification.
- Too little traffic for Adaline to compare examples.
Choose the next action
| What the trajectory shows | Good next action |
|---|---|
| The issue is prompt-addressable. | Start an Improve cycle if the prompt is stored in Adaline. |
| The agent skipped verification. | Add verification instructions, evaluator coverage, or command/tool policy. |
| The problem is tool, retrieval, backend, or environment-related. | Fix that layer before changing prompts. |
| The trajectory is a useful healthy run. | Preserve it with dataset examples or evaluator coverage. |
| The evidence is unclear. | Improve logging and wait for more examples before making a release decision. |
Coding-agent Behaviors
Understand coding-agent task patterns and setup requirements.
Understanding Behaviors
Return to the catalog and detail review workflow.
Logs to Behaviors
Send the traces, spans, and metadata that make trajectories useful.
Improve
Turn prompt-addressable patterns into reviewed prompt candidates.