Skip to main content
Charts turn production logs into a reviewable operating picture. Use them to find what changed, then open the underlying traces before deciding whether the fix belongs in a prompt, evaluator, dataset, tool, model, deployment, or backend system. Monitor charts showing latency and evaluation score trends

Chart groups

Monitor organizes charts by the question they answer:
GroupQuestions it helps answer
Traffic & volumeAre logs arriving? Did volume spike? Did traces become more complex?
Performance & latencyWhich requests are slow? Is tail latency moving?
QualityAre evaluator scores or pass rates changing on live traffic?
Cost & spendAre token usage, total cost, or per-request cost increasing?
Model breakdownWhich model is driving cost, latency, token usage, or efficiency?
Environment breakdownIs the issue isolated to production, staging, or another environment?
Performance breakdownWhich prompts or workflows are slowest?
Tool usageWhich tools or functions are being called, and how often?
The chart title tells you the metric. The subtitle, legend, tooltip, and link target tell you how to investigate it. Monitor charts showing eval score, cost, tokens, eval pass rate by evaluator, and errors by type

Read percentiles and averages

For latency, cost, tokens, spans per trace, and similar metrics, Monitor can show average and percentile values.
ValueUse it when
AvgYou want the overall direction of the system.
P50You want the typical request.
P95You care about the slower or more expensive edge that a meaningful minority of users sees.
P99You are investigating rare but high-impact outliers.
If P95 moves but average stays flat, inspect outlier traces. If both move together, the whole workflow likely changed. Monitor chart tooltip showing evaluator pass-rate values for a selected date

Compare models, environments, and tools

Monitor model breakdown charts showing cost, token usage, latency, token efficiency, and cost per token by model Model and environment breakdowns help separate prompt quality from runtime configuration. For example, a cost increase might come from a new model, longer inputs, more tool calls, or traffic moving into a different deployment environment. Monitor model breakdown chart with tooltip showing latency by model Tool usage charts are useful when the agent looks wrong but the root cause is upstream: the wrong tool was called, a tool returned stale data, or tool latency dominated the trace. Monitor performance and tool usage charts showing slowest prompts and function call counts

Drill into evidence

Use chart actions to move from trend to trace:
  1. Open the chart for the metric that moved.
  2. Hover data points to confirm the exact bucket and value.
  3. Use View traces when available, or open Traces with the same time range.
  4. Filter by the relevant prompt, model, environment, tool, evaluator, status, cost, latency, or token signal.
  5. Inspect representative traces and spans before changing the system.

Common reads

PatternLikely next step
Eval score drops while latency and cost are stableInspect evaluated spans, then update evaluators, datasets, or Improve context.
Cost rises with input tokensReview prompt length, retrieval payloads, conversation context, or tool output size.
P95 latency rises for one modelCompare model performance and check provider/runtime behavior before editing prompts.
Tool call counts shiftInspect tool spans and Behavior patterns before assuming the final prompt is wrong.
A chart points to a single promptOpen the trace evidence and decide whether to build coverage or run Improve.

Analyze log traces

Open the traces behind a chart movement.

Inspect a trace

Read tree view, waterfall view, and span details.

Setup continuous evaluations

Add quality scores to production traffic.

Use logs to improve prompts

Turn a chart signal into a reviewed improvement loop.