Runs
Execute your evaluator tests and get instant insights on prompt performance.
The Run feature lets you test your prompts against real data using your configured evaluators. Get detailed results to optimize your AI responses.
Prerequisites for Running
Before you can execute evaluations, ensure you have:
- At least one evaluator: Add any evaluator (LLM-as-a-Judge, Cost, Latency, etc.) from the available options
- Connected dataset: Link a dataset that contains your test cases and variables
- Matching columns: Dataset columns must match your prompt variable names exactly
- Minimum 1 row: The Dataset needs at least one row of data to evaluate against
Running Your Evaluation
Once your evaluator is ready, click the green Evaluate button to start your evaluation run.
The system processes your prompts against the dataset using your selected evaluators. After the evaluation is completed, the results are displayed.
Background Processing
Evaluations run in the background automatically. You can safely leave the page and return later to check results for larger datasets.
Click on See results to view completed evaluations or monitor progress on ongoing runs.
This background processing means you can continue working on other prompts while your evaluations complete, efficiently using your time with large-scale testing.