Every completed Improve cycle pauses for human review. Use the review page to understand what Adaline found, which candidate you are looking at, how the candidate changes the prompt, and whether approving it will deploy to an environment.
Review the diagnosis first
The diagnosis explains the behavior in focus, why Adaline selected it, and how it showed up in production traces or evaluator failures. Read this section before inspecting the diff. If the diagnosis is wrong, a good-looking candidate can still be the wrong fix.
Use the linked evidence to inspect:
- The Behavior or focus that drove the cycle.
- Representative failing traces, when trace IDs are available.
- Baseline evaluator results and failing cases.
- Whether the issue is high-volume, high-risk, or recent.
- The likely impact on quality, safety, cost, latency, and tokens.
Compare candidates
The review page can include multiple candidates. The selected candidate controls the diff, regression table, real-traffic examples, and approval confirmation.
For each candidate, check:
| Signal | What to verify |
|---|
| Score and pass rate | The candidate improved the target criteria enough to justify the change. |
| Regressions | Previously passing evaluators and dataset rows did not fail in new ways. |
| Recovered failures | The candidate actually fixes the examples or evaluator failures that motivated the run. |
| Cost, tokens, and latency | Runtime changes are acceptable for the deployment environment. |
| Real traffic examples | Before and after examples match the product behavior you want. |
| Candidate identity | The candidate shown in the switcher is the candidate you intend to approve. |
Inspect the prompt diff
The diff shows exactly what the candidate changes. Depending on the prompt, it can include:
- Message text changes.
- Model or generation setting changes.
- Response schema changes.
- Tooling-related changes.
- Explanations for why each change was made.
Look for instructions that are too broad, too narrow, or likely to conflict with existing requirements. Pay special attention to safety boundaries, output format, tool-use instructions, and examples that encode product policy.
If the candidate is close but needs a small edit, use the edit flow instead of approving the candidate as-is.
Understand the actions
| Action | Result |
|---|
| Approve | Applies the selected candidate to the prompt and deploys it to the project’s primary deployment environment when one exists. The confirmation dialog names the candidate and target environment. |
| Edit & approve | Applies the selected candidate and opens the prompt editor so you can adjust it. It does not deploy automatically. |
| Reject | Leaves the prompt unchanged and records the cycle in history. |
Approve is production-affecting when the project has a deployment environment. Use Edit & approve when you want to inspect or tweak the prompt in the editor before deployment.
After approval
After approving, monitor the deployed prompt in Monitor, Traces, and Behaviors. Watch for:
- The original Behavior’s issue rate decreasing.
- Eval score staying stable or improving.
- Cost, token, and latency changes staying within budget.
- New traces that show the prompt still handles normal cases.
- New Behaviors or regressions introduced by the change.
If production behavior regresses, use Deploy to compare deployments and roll back the affected environment.
Promote useful evidence from the cycle into regression coverage so the same issue is tested before the next release.