Skip to main content
Take your configured evaluators from setup to insights. Run tests, view detailed results, and iterate based on real performance data.

Features

Running evaluation testsMeasure and compare prompt performance across runs, which includes:
  • Running evaluations on prompts with multi-modal data such as text, images, PDFs, and by expanding LLMs’ capabilities with tool calls.
  • Running multiple evaluations in the cloud, automatically handling caching and token rate limiting.