The Run feature lets you test your prompts against real data using your configured evaluators. Get detailed results to optimize your AI responses.

Prerequisites for Running

Before you can execute evaluations, ensure you have:

  • At least one evaluator: Add any evaluator (LLM-as-a-Judge, Cost, Latency, etc.) from the available options
  • Connected dataset: Link a dataset that contains your test cases and variables
  • Matching columns: Dataset columns must match your prompt variable names exactly
  • Minimum 1 row: The Dataset needs at least one row of data to evaluate against

Running Your Evaluation

Once your evaluator is ready, click the green Evaluate button to start your evaluation run.

The system processes your prompts against the dataset using your selected evaluators. After the evaluation is completed, the results are displayed.

Background Processing

Evaluations run in the background automatically. You can safely leave the page and return later to check results for larger datasets.

Click on See results to view completed evaluations or monitor progress on ongoing runs.

This background processing means you can continue working on other prompts while your evaluations complete, efficiently using your time with large-scale testing.