Start from the prompt library
Open Prompts in a project to see every prompt in that project. The list shows:- The prompt name.
- Which deployment environments currently have a deployment snapshot.
- The most recent deployment.
- Active evaluators.
- Linked datasets.
- Last updated time.
Configure the model
Choose a model from the workspace providers your admins have configured. Model configuration is stored with the prompt editor state and captured in deployment snapshots. When you change providers, review provider-specific settings carefully. Some settings can be preserved across providers, but others depend on the selected provider and model. Before serious testing, confirm:| Setting | Why it matters |
|---|---|
| Provider and model | Controls quality, latency, cost, context length, tool support, and response format support. |
| Temperature and sampling | Controls consistency and exploration. Lower values are usually better for deterministic workflows. |
| Token limits | Prevents overly long or truncated responses. |
| Response schema | Enforces structured output when your application expects JSON or a strict shape. |
| Tool support | Determines whether the model can call attached tools. |
Write messages and variables
Use prompt messages to separate system policy, developer instructions, examples, user input, assistant examples, and tool context. Keep stable instructions close to the top of the prompt and variable input close to where it is used. Variables make prompts reusable. Use variable names that match the data your runtime and datasets can provide. Good variable practice:- Use clear names such as
customer_question,retrieved_policy,conversation_summary, oraccount_tier. - Avoid renaming variables casually. Datasets and playground cases may depend on the old name.
- Keep optional variables explicit in the prompt so missing data produces a graceful response.
- Prefer one concept per variable instead of packing many unrelated fields into one blob.
Add multimodal content
Prompts can include text and supported media content such as images or PDFs when the selected model supports them. Use multimodal inputs when the model must inspect the actual content, not when a short extracted text summary would be enough. For repeatable evaluation, keep multimodal dataset rows small and representative. Large files can make test runs slower and harder to debug.Attach tools
Attach project tools when the model needs to take an action or fetch external data. Project tools are reusable across prompts in the same project and are referenced from the prompt editor. Use tools for:- Retrieval or search.
- Account, order, ticket, billing, or product lookup.
- Calculations or policy checks.
- Workflow actions that your application controls.
Configure MCP when enabled
Some prompts expose MCP server configuration in the editor when MCP is enabled for the model settings. MCP server configuration is stored with the editor, playground snapshots, and deployment snapshots. Use MCP only when the runtime model and provider path support it. Keep server configuration out of screenshots and docs if it contains URLs, tokens, headers, or internal service names.Run in the playground
Use the playground to test the draft prompt before running wider evaluations.Inspect the output
Check correctness, tone, formatting, refusal behavior, tool calls, latency, cost, and tokens.
Change one thing
Edit a message, variable, tool instruction, or model setting. Avoid changing several knobs at once unless the issue is obvious.
Test against datasets
A playground run tells you whether one case works. A dataset evaluation tells you whether the prompt holds up across many cases. Before running an evaluation:- Make sure dataset columns map to prompt variables.
- Attach the evaluators that define success.
- Include normal, edge, failure, and regression examples.
- Keep the dataset small enough that reviewers can understand failures.
- Name the dataset after the behavior it protects, not the date it was created.
Know when the prompt is ready
A prompt is ready for deployment when:- The editor draft uses the intended provider, model, settings, messages, variables, tools, and schema.
- Representative playground cases pass.
- Dataset evaluations cover the main product risks.
- Cost, latency, and token usage are acceptable.
- Reviewers understand the change from the previous deployed version.
- Deployment environments exist for the release path.
Common prompt design problems
| Problem | What to check |
|---|---|
| Output format keeps breaking | Add or tighten a response schema, JSON evaluator, JavaScript evaluator, or text matcher. |
| Tool is called too often | Narrow the tool description and prompt instructions around when it should be used. |
| Tool is not called | Make the trigger condition explicit and verify the selected model supports tool calling. |
| Answers are too long | Add response-length evaluator coverage and reduce output-token budget if needed. |
| Quality changes run to run | Lower randomness settings and add examples for ambiguous cases. |
| Cost is high | Inspect input token growth, retrieved context, tool chains, and model choice. |
| Latency is high | Inspect provider latency, tool latency, dynamic dataset columns, and orchestration spans. |
Next steps
Version and deploy
Understand drafts, snapshots, deployments, and Improve approval behavior.
Evaluator types
Choose the right evaluator for each success criterion.
Datasets
Build repeatable test coverage for prompts.
Traces
Inspect production evidence after deployment.