Monitor is the first place to look when you want to understand whether a project is healthy. It summarizes production traffic and quality signals before you open individual traces or behavior clusters.
Choose the time range
The dashboard uses the project time range to query analytics. Short ranges use smaller buckets; long ranges use larger buckets. The same time-range choice helps keep Monitor, Traces, and Behaviors aligned during investigation.
Use time ranges intentionally:
| Time range goal | Use it for |
|---|
| Last few hours | Active incidents, load tests, fresh deployments, or sudden spikes. |
| Last 24 hours | Daily production health and most release reviews. |
| Last week | Recurring issues, weekly traffic patterns, and quality drift. |
| Last month or longer | Product-level trends, cost review, and capacity planning. |
If the project has never received traces, Monitor shows an empty state. If the project has traces but the selected range has no data, the chart surface can still render with empty values.
Read the summary cards
Monitor compares the selected period with the previous equivalent period. The direction of a change is not always good or bad by itself, so read it in context.
| Card | What it measures | How to interpret it |
|---|
| Logs | Trace volume. | A drop can mean lower traffic or broken instrumentation. A spike can mean growth, retries, load tests, or runaway automation. |
| Avg latency | Weighted average request latency. | Higher latency is usually worse; investigate provider, tool, retrieval, or orchestration spans. |
| Avg cost | Average cost per span when cost is available. | Higher cost can come from longer prompts, larger outputs, model changes, tool chains, or retries. |
| Avg input tokens | Average prompt/input token volume. | Growth can indicate longer system prompts, retrieved context, conversation history, or payload expansion. |
| Avg output tokens | Average completion/output token volume. | Growth can indicate rambling answers, changed instructions, or model behavior shifts. |
| Avg eval score | Aggregated continuous-evaluation score. | Drops should lead directly into Traces, evaluator results, and Behaviors. |
Read charts, not only cards
The summary cards tell you that a metric changed. The charts show when and how.
Look for:
- A single spike versus a sustained change.
- A change that starts immediately after deployment.
- Metric movement concentrated in one bucket.
- Latency and cost moving together.
- Input tokens increasing before cost increases.
- Eval score dropping while traffic is stable.
When charts show a suspicious period, open Traces and filter by the same time window.
Use the recents rail
The right rail surfaces recent prompts and datasets. Use it to move from project-level health to the objects most likely to explain a change.
- Open recent prompts when a metric shift might be tied to prompt editing or deployment.
- Open recent datasets when new regression coverage or evaluator runs may explain score changes.
- Use recent object activity as a hint, not proof. Confirm with traces and deployment history.
Read Monitor during release review
Before and after a deployment, compare:
- Traffic volume.
- Latency.
- Cost.
- Input and output tokens.
- Eval score.
- Recent traces for normal requests.
- Behaviors that represent known issues.
If the release changed model, provider, response schema, tool use, or retrieval context, expect Monitor to move. The question is whether the movement is acceptable and understood.
When Monitor is not enough
Use the next page when you see a real change:
Monitor answers “what changed?” Traces answer “what happened?” Behaviors answer “is this repeated?” Improve answers “can we safely change the prompt?”