Evaluator types
Adaline supports several evaluator patterns:| Evaluator | Use it for |
|---|---|
| LLM-as-a-Judge | Rubric-based quality, safety, policy, tone, and reasoning checks. |
| Custom Prompt | LLM-based evaluation with custom model configuration and prompt logic. |
| JavaScript | Deterministic checks, schema validation, custom scoring, and business rules. |
| JSON | Structured JSON checks and schema-like assertions. |
| API Call | External service checks that need your own evaluator endpoint. |
| Text Matcher | Required or forbidden strings, regexes, and formatting markers. |
| Cost | Budget thresholds based on provider cost. |
| Latency | SLA thresholds based on runtime. |
| Response Length | Word, token, character, or brevity requirements. |
Where evaluators run
Evaluators can be used in three workflows:- Prompt evaluations - run against datasets before deployment.
- Monitor and Traces - score sampled production traffic when continuous evaluation is configured.
- Improve - reject candidates that improve one metric but regress another.
Create useful evaluators
Good evaluators are specific, measurable, and tied to product expectations.Start from a failure mode
Use a Behavior, failing trace, or product requirement to define what should pass or fail.
Choose the evaluator type
Use deterministic evaluators for exact rules and LLM-as-a-Judge for qualitative criteria.
Attach the evaluator to the prompt
Link the evaluator where it should run so evaluations and Improve cycles can use it.
Evaluator design patterns
| Goal | Recommended evaluator |
|---|---|
| Enforce exact output shape | JavaScript, JSON, or Text Matcher |
| Catch subjective quality issues | LLM-as-a-Judge or Custom Prompt |
| Control spend | Cost evaluator |
| Control runtime | Latency evaluator |
| Prevent rambling or terse answers | Response Length evaluator |
| Use internal business logic | JavaScript or API Call evaluator |
Evaluators and Improve
Improve uses evaluators as safety rails. A candidate that improves the target Behavior can still be blocked if it regresses an existing evaluator. Strong evaluator coverage makes Improve reviews more trustworthy because the candidate is measured against the rules your team already accepts.Evaluator types
Choose the right evaluator for quality, schema, cost, latency, and custom rules.
Create evaluators
Turn product requirements and production failures into repeatable checks.
Run evaluations
Review prompt evaluations, continuous scoring, and release gates.
Datasets
Store the cases evaluators should score.