Skip to main content
Monitor is the first place to look when you want to understand whether a project is healthy. It summarizes production traffic and quality signals before you open individual traces or behavior clusters.

Choose the time range

The dashboard uses the project time range to query analytics. Short ranges use smaller buckets; long ranges use larger buckets. The same time-range choice helps keep Monitor, Traces, and Behaviors aligned during investigation. Use time ranges intentionally:
Time range goalUse it for
Last few hoursActive incidents, load tests, fresh deployments, or sudden spikes.
Last 24 hoursDaily production health and most release reviews.
Last weekRecurring issues, weekly traffic patterns, and quality drift.
Last month or longerProduct-level trends, cost review, and capacity planning.
If the project has never received traces, Monitor shows an empty state. If the project has traces but the selected range has no data, the chart surface can still render with empty values.

Read the summary cards

Monitor compares the selected period with the previous equivalent period. The direction of a change is not always good or bad by itself, so read it in context.
CardWhat it measuresHow to interpret it
LogsTrace volume.A drop can mean lower traffic or broken instrumentation. A spike can mean growth, retries, load tests, or runaway automation.
Avg latencyWeighted average request latency.Higher latency is usually worse; investigate provider, tool, retrieval, or orchestration spans.
Avg costAverage cost per span when cost is available.Higher cost can come from longer prompts, larger outputs, model changes, tool chains, or retries.
Avg input tokensAverage prompt/input token volume.Growth can indicate longer system prompts, retrieved context, conversation history, or payload expansion.
Avg output tokensAverage completion/output token volume.Growth can indicate rambling answers, changed instructions, or model behavior shifts.
Avg eval scoreAggregated continuous-evaluation score.Drops should lead directly into Traces, evaluator results, and Behaviors.

Read charts, not only cards

The summary cards tell you that a metric changed. The charts show when and how. Look for:
  • A single spike versus a sustained change.
  • A change that starts immediately after deployment.
  • Metric movement concentrated in one bucket.
  • Latency and cost moving together.
  • Input tokens increasing before cost increases.
  • Eval score dropping while traffic is stable.
When charts show a suspicious period, open Traces and filter by the same time window.

Use the recents rail

The right rail surfaces recent prompts and datasets. Use it to move from project-level health to the objects most likely to explain a change.
  • Open recent prompts when a metric shift might be tied to prompt editing or deployment.
  • Open recent datasets when new regression coverage or evaluator runs may explain score changes.
  • Use recent object activity as a hint, not proof. Confirm with traces and deployment history.

Read Monitor during release review

Before and after a deployment, compare:
  • Traffic volume.
  • Latency.
  • Cost.
  • Input and output tokens.
  • Eval score.
  • Recent traces for normal requests.
  • Behaviors that represent known issues.
If the release changed model, provider, response schema, tool use, or retrieval context, expect Monitor to move. The question is whether the movement is acceptable and understood.

When Monitor is not enough

Use the next page when you see a real change:
Monitor answers “what changed?” Traces answer “what happened?” Behaviors answer “is this repeated?” Improve answers “can we safely change the prompt?”