Evaluate
Overview
The Evaluate pillar is your testing powerhouse within Adaline, where you validate and optimize your prompts using real data.
What is Evaluate?
Evaluate let’s you evaluate your prompt on thousand rows. It’s your quality assurance center where you test prompts against real-world scenarios, measure their effectiveness, and identify areas for improvement.
Here, you can run batch evaluations, compare different prompt versions, and ensure your AI solutions meet performance standards before going live.
Key Features
Datasets
Your evaluation foundation:
- Create and manage test datasets with multiple data types.
- Import existing data from CSV, JSON, or Excel files.
- Perform row and column operations for data preparation.
- Search, filter, and work with image data.
- Build comprehensive test suites for thorough validation.
Evaluation Runs
Execute and analyze your tests:
- Run evaluations across entire datasets.
- View detailed results and performance metrics.
- Filter and search through evaluation history.
- Review past runs and rollback to previous versions.
- Open any evaluation directly in Playground for debugging.
Evaluators
Choose the right metrics for your use case:
- Use LLM-as-a-Judge to assess quality.
- Measure information retrieval accuracy.
- Use JavaScript, JSON, Text Matching for technical validators.
- Use metrics like Completion Length and Latency to measure performance.