Skip to main content
Every completed Improve cycle pauses for human or external AI review. Use the review page to decide whether Adaline’s candidate should be approved, edited, or rejected. The reviewer, whether a person or external AI agent, owns the production decision: the diagnosis must match the real issue, the diff must be understandable, impact must be acceptable, and the deployment target must be correct. Improve review page showing diagnosis, representative failing traces, prompt diff, and traffic comparison

Read the review page top to bottom

Start with the evidence path. A good review package shows how the cycle moved from production evidence to candidate prompt changes, and whether each stage produced enough signal to support a release decision. Improve cycle stage provenance showing Behaviors, Evals, Datasets, Prompts, and Review evidence
StageWhat to check
BehaviorsThe run targeted a specific repeated pattern or issue.
EvalsAuthored and auto generated evaluators cover the target behavior and important healthy paths.
DatasetsProduction, curated, and synthetic cases are representative enough to compare baseline and candidate.
PromptsMultiple candidates were explored, and unsafe or regressing options were filtered out.
ReviewThe selected candidate has a diff, traffic examples, score movement, and runtime tradeoffs.
Then move through the candidate review itself. Read the diagnosis first, confirm the selected candidate, inspect the diff, compare example outputs, check regressions, and only then look at deployment impact. Candidate review traffic comparison showing current and improved outputs for tested conversations
SectionQuestionStop if
DiagnosisWhat problem did Adaline try to fix?It does not match the customer or product issue.
CandidateWhich candidate is selected?It is not the candidate you intend to apply.
Prompt diffWhat exactly changes?The diff changes policy, format, tool behavior, or tone in a risky way.
Traffic comparisonWould real users get a better answer?Scores improved but the user experience got worse.
Regression reportWhich checks improved or regressed?A protected evaluator drops without explicit signoff.
Cost, tokens, latencyWhat runtime tradeoff comes with the candidate?The movement breaks the release budget.
Deployment targetWhich environment changes if approved?The target environment is wrong or unclear.
Improve regression report and runtime tradeoff section showing evaluator scores, cost, latency, and token movement Before approving, make sure the diagnosis matches the customer or product problem, the supporting Behaviors and logs are relevant, the prompt diff is understandable, evaluator or dataset regressions are acceptable, runtime impact is acceptable, and the deployment target is the one you intend to change.
Approve can affect production when the project has a deployment environment. Use Edit & approve when you want to inspect or adjust the prompt before deployment.

After the decision

Improve review action bar showing no regressions, target prompt, cycle number, version creation, reject, and edit and approve actions
DecisionWhat to do next
ApproveApply the selected candidate and deploy it when a primary environment is configured. Watch Monitor, Logs, and Behaviors during the release window.
Edit & approveApply the candidate, inspect or adjust the prompt, run evaluations, then release through Deploy your prompt or your external deployment path.
RejectLeave the prompt unchanged and record the reason: wrong diagnosis, weak evidence, regression, runtime cost, or wrong fix layer.
You can also export the audit packet as JSON for records, external AI review, or a no-human-in-the-loop handoff before deciding what happens next.

Export Audit Packet

Download the review evidence as JSON for records or external systems.

Auto Prompt Optimization

Understand candidate exploration, safety gates, and scoring evidence.

Auto Generated Evaluators

See how generated evaluators help check the candidate before release.

Deploy your prompt

Continue from reviewed prompt version to deployment.