What is Evaluate?

Key Features
Datasets

- Create and manage test datasets with multiple data types.
- Import existing data from CSV, JSON, or Excel files.
- Perform row and column operations for data preparation.
- Search, filter, and work with image data.
- Build comprehensive test suites for thorough validation.
Evaluation Runs

- Run evaluations across entire datasets.
- View detailed results and performance metrics.
- Filter and search through evaluation history.
- Review past runs and rollback to previous versions.
- Open any evaluation directly in Playground for debugging.
Evaluators

- Use LLM-as-a-Judge to assess quality.
- Measure information retrieval accuracy.
- Use JavaScript, JSON, Text Matching for technical validators.
- Use metrics like Completion Length and Latency to measure performance.