Review and apply candidates

Every completed Improve cycle pauses for human review. Use the review page to understand what Adaline found, which candidate you are looking at, how the candidate changes the prompt, and whether approving it will deploy to an environment.

Improve review page showing diagnosis, candidate diff, and approve or reject actions

Review the diagnosis first

The diagnosis explains the behavior in focus, why Adaline selected it, and how it showed up in production traces or evaluator failures. Read this section before inspecting the diff. If the diagnosis is wrong, a good-looking candidate can still be the wrong fix. Use the linked evidence to inspect:

The Behavior or focus that drove the cycle.
Representative failing traces, when trace IDs are available.
Baseline evaluator results and failing cases.
Whether the issue is high-volume, high-risk, or recent.
The likely impact on quality, safety, cost, latency, and tokens.

Compare candidates

The review page can include multiple candidates. The selected candidate controls the diff, regression table, real-traffic examples, and approval confirmation. For each candidate, check:

Signal	What to verify
Score and pass rate	The candidate improved the target criteria enough to justify the change.
Regressions	Previously passing evaluators and dataset rows did not fail in new ways.
Recovered failures	The candidate actually fixes the examples or evaluator failures that motivated the run.
Cost, tokens, and latency	Runtime changes are acceptable for the deployment environment.
Real traffic examples	Before and after examples match the product behavior you want.
Candidate identity	The candidate shown in the switcher is the candidate you intend to approve.

Inspect the prompt diff

The diff shows exactly what the candidate changes. Depending on the prompt, it can include:

Message text changes.
Model or generation setting changes.
Response schema changes.
Tooling-related changes.
Explanations for why each change was made.

Look for instructions that are too broad, too narrow, or likely to conflict with existing requirements. Pay special attention to safety boundaries, output format, tool-use instructions, and examples that encode product policy. If the candidate is close but needs a small edit, use the edit flow instead of approving the candidate as-is.

Understand the actions

Action	Result
Approve	Applies the selected candidate to the prompt and deploys it to the project’s primary deployment environment when one exists. The confirmation dialog names the candidate and target environment.
Edit & approve	Applies the selected candidate and opens the prompt editor so you can adjust it. It does not deploy automatically.
Reject	Leaves the prompt unchanged and records the cycle in history.

Approve is production-affecting when the project has a deployment environment. Use Edit & approve when you want to inspect or tweak the prompt in the editor before deployment.

After approval

After approving, monitor the deployed prompt in Monitor, Traces, and Behaviors. Watch for:

The original Behavior’s issue rate decreasing.
Eval score staying stable or improving.
Cost, token, and latency changes staying within budget.
New traces that show the prompt still handles normal cases.
New Behaviors or regressions introduced by the change.

If production behavior regresses, use Deploy to compare deployments and roll back the affected environment. Promote useful evidence from the cycle into regression coverage so the same issue is tested before the next release.

​Review the diagnosis first

​Compare candidates

​Inspect the prompt diff

​Understand the actions

​After approval

Review the diagnosis first

Compare candidates

Inspect the prompt diff

Understand the actions

After approval