Skip to main content
The Run feature lets you test your prompts against production test cases stored in Datasets, using your configured evaluators. Get detailed results to optimize your AI responses.

Prerequisites

Before you can run evaluations, ensure you have:

Running Evaluation

Once your evaluators are ready, click on the Evaluate button to start the evaluation run: Starting evaluation tests The system processes your prompts against the dataset using the evaluators you set up. After the evaluation is completed, the system displays the results: Displaying test results

Background Processing and Concurrent Executions

Evaluations run in the background (in cloud) automatically. This means you can safely leave the page and return later to check results. This background processing means you can continue working on other prompts while your evaluations complete. Also, note that you can run up to 5 concurrent evaluation tests at the same time. Each time you click on Evaluate, the platform launches a new execution in parallel to the others already started.
TIP: Background processing and concurrent executions allow you to efficiently use your time with large-scale testing. Use this feature for running tests with large models, models with rate limits, or large datasets to improve the efficiency on your end. Instead of waiting several minutes for an evaluation to complete before starting another, launch the other in parallel.