Build and test prompts

Use the prompt editor when you are designing the behavior your application will run. A prompt is more than message text: it includes the model provider, generation settings, messages, variables, tools, optional MCP server configuration, evaluation links, playground history, and deployable snapshots. The best prompt work is evidence-led. Build the draft in the editor, test it in the playground, run it against datasets, then inspect traces after deployment.

Start from the prompt library

Open Prompts in a project to see every prompt in that project. The list shows:

The prompt name.
Which deployment environments currently have a deployment snapshot.
The most recent deployment.
Active evaluators.
Linked datasets.
Last updated time.

Select a row to preview the prompt, or open the prompt to edit and test it.

Configure the model

Choose a model from the workspace providers your admins have configured. Model configuration is stored with the prompt editor state and captured in deployment snapshots. When you change providers, review provider-specific settings carefully. Some settings can be preserved across providers, but others depend on the selected provider and model. Before serious testing, confirm:

Setting	Why it matters
Provider and model	Controls quality, latency, cost, context length, tool support, and response format support.
Temperature and sampling	Controls consistency and exploration. Lower values are usually better for deterministic workflows.
Token limits	Prevents overly long or truncated responses.
Response schema	Enforces structured output when your application expects JSON or a strict shape.
Tool support	Determines whether the model can call attached tools.

If playground runs fail with a provider-key error, ask a workspace admin to configure the provider in Workspace settings.

Write messages and variables

Use prompt messages to separate system policy, developer instructions, examples, user input, assistant examples, and tool context. Keep stable instructions close to the top of the prompt and variable input close to where it is used. Variables make prompts reusable. Use variable names that match the data your runtime and datasets can provide. Good variable practice:

Use clear names such as customer_question, retrieved_policy, conversation_summary, or account_tier.
Avoid renaming variables casually. Datasets and playground cases may depend on the old name.
Keep optional variables explicit in the prompt so missing data produces a graceful response.
Prefer one concept per variable instead of packing many unrelated fields into one blob.

Add multimodal content

Prompts can include text and supported media content such as images or PDFs when the selected model supports them. Use multimodal inputs when the model must inspect the actual content, not when a short extracted text summary would be enough. For repeatable evaluation, keep multimodal dataset rows small and representative. Large files can make test runs slower and harder to debug.

Attach tools

Attach project tools when the model needs to take an action or fetch external data. Project tools are reusable across prompts in the same project and are referenced from the prompt editor. Use tools for:

Retrieval or search.
Account, order, ticket, billing, or product lookup.
Calculations or policy checks.
Workflow actions that your application controls.

Keep tool instructions specific. The prompt should explain when to call the tool, what to do with failures, and how to answer when a tool returns incomplete data. See Create and link tools for the tool workflow.

Configure MCP when enabled

Some prompts expose MCP server configuration in the editor when MCP is enabled for the model settings. MCP server configuration is stored with the editor, playground snapshots, and deployment snapshots. Use MCP only when the runtime model and provider path support it. Keep server configuration out of screenshots and docs if it contains URLs, tokens, headers, or internal service names.

Run in the playground

Use the playground to test the draft prompt before running wider evaluations.

Run one representative case

Start with a realistic input from the product, not a toy example.

Inspect the output

Check correctness, tone, formatting, refusal behavior, tool calls, latency, cost, and tokens.

Change one thing

Edit a message, variable, tool instruction, or model setting. Avoid changing several knobs at once unless the issue is obvious.

Run again

Compare the new output with the previous run and keep the better evidence.

Save important cases

Turn strong examples and failures into dataset rows so future versions can be tested repeatedly.

Test against datasets

A playground run tells you whether one case works. A dataset evaluation tells you whether the prompt holds up across many cases. Before running an evaluation:

Make sure dataset columns map to prompt variables.
Attach the evaluators that define success.
Include normal, edge, failure, and regression examples.
Keep the dataset small enough that reviewers can understand failures.
Name the dataset after the behavior it protects, not the date it was created.

See Create and import datasets and Create evaluators.

Know when the prompt is ready

A prompt is ready for deployment when:

The editor draft uses the intended provider, model, settings, messages, variables, tools, and schema.
Representative playground cases pass.
Dataset evaluations cover the main product risks.
Cost, latency, and token usage are acceptable.
Reviewers understand the change from the previous deployed version.
Deployment environments exist for the release path.

Common prompt design problems

Problem	What to check
Output format keeps breaking	Add or tighten a response schema, JSON evaluator, JavaScript evaluator, or text matcher.
Tool is called too often	Narrow the tool description and prompt instructions around when it should be used.
Tool is not called	Make the trigger condition explicit and verify the selected model supports tool calling.
Answers are too long	Add response-length evaluator coverage and reduce output-token budget if needed.
Quality changes run to run	Lower randomness settings and add examples for ambiguous cases.
Cost is high	Inspect input token growth, retrieved context, tool chains, and model choice.
Latency is high	Inspect provider latency, tool latency, dynamic dataset columns, and orchestration spans.

Next steps

Version and deploy

Understand drafts, snapshots, deployments, and Improve approval behavior.

Evaluator types

Choose the right evaluator for each success criterion.

Datasets

Build repeatable test coverage for prompts.

Traces

Inspect production evidence after deployment.

​Start from the prompt library

​Configure the model

​Write messages and variables

​Add multimodal content

​Attach tools

​Configure MCP when enabled

​Run in the playground

​Test against datasets

​Know when the prompt is ready

​Common prompt design problems

​Next steps

Version and deploy

Evaluator types

Datasets

Traces

Start from the prompt library

Configure the model

Write messages and variables

Add multimodal content

Attach tools

Configure MCP when enabled

Run in the playground

Test against datasets

Know when the prompt is ready

Common prompt design problems

Next steps