Evaluators
Overview
Test and optimize your prompts with data-driven insights.
Evaluators benchmark your prompt performance on various dimensions, given your test cases. You can run tests against your datasets to find what works best.
Features
Smart AI Assessment
Let LLM evaluate the response for you.
- LLM-as-a-judge evaluates quality using your custom rubric.
- JavaScript Evaluator runs your custom code.
Performance Monitoring
Track speed and efficiency in real-time.
- Use metrics like Latency to measure response time in milliseconds.
- Tracks the exact spending and cost per prompt and responses.
- Monitor performance across different models.
Content Validation
Ensure responses meet your exact requirements.
- Use Response Length to control the output size in tokens, words, or characters
- Find specific keywords and patterns using Text Matcher.
- Validate format compliance automatically.
- Catch quality issues before deployment.
Data-Driven Optimization
Turn evaluation results into better prompts.
- Compare performance across prompt variations.
- Track improvements over time.
- Make evidence-based optimization decisions.