The loss layer for AI agent traces.

Measure your AI agent's real-world error rate and dollar loss — the loss layer on top of your existing traces and systems of record.

pip install agentloss

Instrument the consequential action

Not every LLM call — just the tool call that moves money or commits the business (payments, approvals, ticket resolutions, writes to systems of record).

Join to ground truth

Outcomes arrive late and from outside the agent — corrections, disputes, audits, human review. agentloss captures them and reconciles by business key.

Get the numbers that matter

False-approve (error) rate by segment with confidence intervals, realized and expected dollar loss, and incremental risk vs. a baseline.

Quickstart

Instrument the consequential decision, then report outcomes as they resolve.

from agentloss import decision, report_outcome, Decision

@decision                                    # wraps a consequential action; records it
def approve_payment(invoice):
    action = run_matching(invoice)           # your logic -> "approve" | "hold" | "reject"
    return Decision(
        action=action,
        value_at_risk_usd=invoice.total,     # per-decision exposure
        business_key=invoice.number,         # natural join key for delayed outcomes
        use_case="ap_3way_match",
    )

# later, when the outcome resolves (a correction, dispute, audit, or human review):
report_outcome(
    business_key="INV-88231",
    ground_truth="duplicate-should-block",
    source="recovery_audit",                 # human_queue | verification_agent | recovery_audit
    realized_loss_usd=14200,
)

Already have traces? Read them in.

agentloss is OpenTelemetry / OpenInference-aligned, so it drops in as a thin layer on top of your existing traces. If you already collect spans in Arize Phoenix, point the connector at them instead of hand-instrumenting:

from agentloss.connectors import phoenix

# pull consequential-action spans from a Phoenix project into agentloss Decisions
for d in phoenix.read_decisions(project="ap-agent", endpoint="http://localhost:6006"):
    ...   # each `d` is a Decision the loss layer scores against ground truth

Raw prompts and records stay inside your boundary; only derived metrics leave.

For AI coding agents

SDK adoption in 2026 is agent-written. agentloss is built to be found and wired by coding agents with no human in the integration loop.

FAQ

How do I measure what my AI agent's mistakes actually cost?

Instrument the consequential action with @agentloss.decision and record the per-decision exposure (value_at_risk_usd). When outcomes resolve — a correction, dispute, chargeback, or audit — call report_outcome (or join a whole table with record_outcomes). agentloss reports realized and expected dollar loss, not a quality proxy score.

How do I measure my AI agent's real-world error rate in production?

agentloss computes the false-approve (error) rate by segment with confidence intervals, joining each production decision to the real resolved outcome — not an offline labeled eval set. If you don't have ground truth yet, sample_and_verify() produces a number with a verification agent, so you're never stuck at zero.

How is agentloss different from Braintrust, Langfuse, Arize, or eval tools?

Those score quality proxies — LLM-judge, hallucination rate, task completion. agentloss answers the question they don't: what do the mistakes cost, and is the agent safe enough to trust with more autonomy? It's OpenTelemetry / OpenInference-aligned, so it reads your existing traces and adds the loss layer on top — it doesn't replace your observability stack.

How do I make my AI agent insurable, or prove it's reliable enough for more autonomy?

Underwriting and autonomy decisions need the record that agentloss produces: error rate with confidence intervals, realized dollar loss against real outcomes, and incremental risk vs. a baseline — the exact evidence an insurer's questionnaire (e.g. Munich Re aiSure) asks for and that eval scores can't supply.

I already have traces (Phoenix / OpenTelemetry). Do I re-instrument?

No. Add a few agentloss.* attributes to the consequential span and point the connector at your spans — from agentloss.connectors import phoenix. agentloss adds the loss/outcome layer on top of what your tracer already emits.

Do I instrument every LLM call?

No — instrument only the consequential action: the handful of tool calls that move money or commit the business. That's where the cost lives, and it keeps the integration a thin layer.