
Read the review page top to bottom
Start with the evidence path. A good review package shows how the cycle moved from production evidence to candidate prompt changes, and whether each stage produced enough signal to support a release decision.
| Stage | What to check |
|---|---|
| Behaviors | The run targeted a specific repeated pattern or issue. |
| Evals | Authored and auto generated evaluators cover the target behavior and important healthy paths. |
| Datasets | Production, curated, and synthetic cases are representative enough to compare baseline and candidate. |
| Prompts | Multiple candidates were explored, and unsafe or regressing options were filtered out. |
| Review | The selected candidate has a diff, traffic examples, score movement, and runtime tradeoffs. |

| Section | Question | Stop if |
|---|---|---|
| Diagnosis | What problem did Adaline try to fix? | It does not match the customer or product issue. |
| Candidate | Which candidate is selected? | It is not the candidate you intend to apply. |
| Prompt diff | What exactly changes? | The diff changes policy, format, tool behavior, or tone in a risky way. |
| Traffic comparison | Would real users get a better answer? | Scores improved but the user experience got worse. |
| Regression report | Which checks improved or regressed? | A protected evaluator drops without explicit signoff. |
| Cost, tokens, latency | What runtime tradeoff comes with the candidate? | The movement breaks the release budget. |
| Deployment target | Which environment changes if approved? | The target environment is wrong or unclear. |

After the decision

| Decision | What to do next |
|---|---|
| Approve | Apply the selected candidate and deploy it when a primary environment is configured. Watch Monitor, Logs, and Behaviors during the release window. |
| Edit & approve | Apply the candidate, inspect or adjust the prompt, run evaluations, then release through Deploy your prompt or your external deployment path. |
| Reject | Leave the prompt unchanged and record the reason: wrong diagnosis, weak evidence, regression, runtime cost, or wrong fix layer. |
Export Audit Packet
Download the review evidence as JSON for records or external systems.
Auto Prompt Optimization
Understand candidate exploration, safety gates, and scoring evidence.
Auto Generated Evaluators
See how generated evaluators help check the candidate before release.
Deploy your prompt
Continue from reviewed prompt version to deployment.