Analyze log charts

Charts turn production logs into a reviewable operating picture. Use them to find what changed, then open the underlying traces before deciding whether the fix belongs in a prompt, evaluator, dataset, tool, model, deployment, or backend system.

Monitor charts showing latency and evaluation score trends

Chart groups

Monitor organizes charts by the question they answer:

Group	Questions it helps answer
Traffic & volume	Are logs arriving? Did volume spike? Did traces become more complex?
Performance & latency	Which requests are slow? Is tail latency moving?
Quality	Are evaluator scores or pass rates changing on live traffic?
Cost & spend	Are token usage, total cost, or per-request cost increasing?
Model breakdown	Which model is driving cost, latency, token usage, or efficiency?
Environment breakdown	Is the issue isolated to production, staging, or another environment?
Performance breakdown	Which prompts or workflows are slowest?
Tool usage	Which tools or functions are being called, and how often?

The chart title tells you the metric. The subtitle, legend, tooltip, and link target tell you how to investigate it.

Monitor charts showing eval score, cost, tokens, eval pass rate by evaluator, and errors by type

Read percentiles and averages

For latency, cost, tokens, spans per trace, and similar metrics, Monitor can show average and percentile values.

Value	Use it when
Avg	You want the overall direction of the system.
P50	You want the typical request.
P95	You care about the slower or more expensive edge that a meaningful minority of users sees.
P99	You are investigating rare but high-impact outliers.

If P95 moves but average stays flat, inspect outlier traces. If both move together, the whole workflow likely changed.

Monitor chart tooltip showing evaluator pass-rate values for a selected date

Compare models, environments, and tools

Monitor model breakdown charts showing cost, token usage, latency, token efficiency, and cost per token by model

Model and environment breakdowns help separate prompt quality from runtime configuration. For example, a cost increase might come from a new model, longer inputs, more tool calls, or traffic moving into a different deployment environment.

Monitor model breakdown chart with tooltip showing latency by model

Tool usage charts are useful when the agent looks wrong but the root cause is upstream: the wrong tool was called, a tool returned stale data, or tool latency dominated the trace.

Monitor performance and tool usage charts showing slowest prompts and function call counts

Drill into evidence

Use chart actions to move from trend to trace:

Open the chart for the metric that moved.
Hover data points to confirm the exact bucket and value.
Use View traces when available, or open Traces with the same time range.
Filter by the relevant prompt, model, environment, tool, evaluator, status, cost, latency, or token signal.
Inspect representative traces and spans before changing the system.

Common reads

Pattern	Likely next step
Eval score drops while latency and cost are stable	Inspect evaluated spans, then update evaluators, datasets, or Improve context.
Cost rises with input tokens	Review prompt length, retrieval payloads, conversation context, or tool output size.
P95 latency rises for one model	Compare model performance and check provider/runtime behavior before editing prompts.
Tool call counts shift	Inspect tool spans and Behavior patterns before assuming the final prompt is wrong.
A chart points to a single prompt	Open the trace evidence and decide whether to build coverage or run Improve.

Analyze log traces

Open the traces behind a chart movement.

Inspect a trace

Read tree view, waterfall view, and span details.

Setup continuous evaluations

Add quality scores to production traffic.

Use logs to improve prompts

Turn a chart signal into a reviewed improvement loop.

​Chart groups

​Read percentiles and averages

​Compare models, environments, and tools

​Drill into evidence

​Common reads