Agent Reliability

Instrument your agent so the cause can be found

Push traffic to the OTLP receiver in gen_ai OpenTelemetry conventions, add the TypeScript or Python SDK, or connect Langfuse or Braintrust and start from the traces you already have. Every node you capture is a node attribution can walk back through, so when your agent gets worse in production, Tessary can name the change that caused it.

Start with your repo See how it works

Ingest: OTLP receiver (gRPC 4317 / HTTP 4318) and TypeScript + Python SDKs take production traffic in gen_ai OTel conventions.
Connect: Already on Langfuse or Braintrust? Connect either as an upstream source. No re-instrumenting to start.
Payoff: When a grader trend drops, Tessary walks the captured nodes back to what caused it.

The Problem

Errors are the loud failures. Traces have to hold the quiet ones.

An agent is a chain of model calls across some framework, talking to some provider. Before a cause can be found, the traffic has to land in a model that keeps the structure: sessions, turns, tool calls, and observations. Raw spans you read by hand are not that.

of all LLM call spans reported an error in February 2026. The drops that cost you customers mostly return a clean response and never show up in an error rate.

Datadog, State of AI Engineering 2026

60%

of those errors were exceeded rate limits. Infrastructure telemetry catches infrastructure problems; a wrong answer delivered politely needs graders over structured traces.

Datadog, State of AI Engineering 2026

agent framework adoption nearly doubled in a year, from over 9% of orgs in early 2025 to almost 18% by early 2026. Ingest that assumes one stack fits less of what teams ship.

Datadog, State of AI Engineering 2026

Wire It In

A traced call site in five lines

With a coding agent doing the wiring, instrumenting your call sites takes hours, not a sprint. You do not need full coverage on day one; the nodes you capture first are where cause hunts begin.

import { init, traced } from "@tessary/sdk";

init({ apiKey: process.env.TESSARY_API_KEY }); // OTLP under the hood: gRPC 4317 / HTTP 4318, gen_ai conventions

export const handleTicket = traced("support-agent", async (ticket) =>
  agent.run(ticket) // each run lands as a session: turns, tool calls, observations
);

How It Works

Ingest, connect, detect, and trace the cause

Push traffic over OTLP or start from the trace stack you already run, then let graders watch the trend and attribution name the change when it breaks.

Push traffic over OTLP
Send production traffic to the OTLP receiver (gRPC 4317 / HTTP 4318) in gen_ai OpenTelemetry conventions, or add the TypeScript or Python SDK. It lands as an agent-native model: Session, Turn, Trace, typed Observation, first-class ToolCall, and role-aware multimodal Message. What you capture here sets how far back a trace can be walked when a trend breaks.
Or connect what you already have
Already instrumented on Langfuse or Braintrust? Connect either as an upstream source. They hold your traces and track your scores; Tessary reads that same data to answer the question they leave open: which change moved the score.
Detect as a trend over graders
Graders run over the traffic: natural-language checks compiled to classifiers or regex, tool-error extraction, and entity tracking that remembers what the agent has seen across turns. Detection reads the trend across verdicts, not any single score. An imperfect grader misfires at a stable rate, so a failure rate that jumps from 3% to 11% signals a real change.
Trace a drop to the change behind it
When a grader trend breaks, Tessary traces the failing quality dimension back through the agent's chain to the change that caused it: a prompt edit, a tool update, a dependency bump, or a provider model change that never touched your repo. Commit-SHA lineage covers the changes that shipped through a PR; the chain walk covers the ones that did not.

What You Get

Ingest that fits an agent in production

Two ways to land traffic

An OTLP receiver (gRPC / HTTP) takes gen_ai OpenTelemetry spans, and production TypeScript and Python SDKs plus batch importers cover agents that do not emit them yet.

Nodes that pay for themselves

Instrumentation exists for attribution: the more of the agent's chain the traces capture, the further back a cause hunt can walk. Start thin. When a hunt hits a gap in the traces, that gap is the next node to instrument, and the next degradation is easier to trace.

Start from your existing stack

On Langfuse or Braintrust already? Connect it as an upstream source and begin with the traces you have. Instrument deeper only where a trace goes dark.

Signals, not raw spans

Graders over the graph flag the failure mode that is firing on real traffic, instead of a trace UI you read by hand.

Pre-deploy CI gate

A GitHub Action reads the diff before a deploy, posts a pre-deploy trouble report on the PR, and can block the merge, so trouble a change is likely to cause is read before it ships.

Provider and stack agnostic

Works across model providers and trace stacks, so it fits agents built on different tooling.

What the CI gate catchesTrouble a change is likely to cause, read before deploy

FAQ

Questions about the SDK and ingest

It instruments your agent to emit production traces to Tessary. Traffic lands over the OTLP receiver in gen_ai OpenTelemetry conventions, or through the TypeScript or Python SDK, as an agent-native model of sessions, turns, traces, observations, and tool calls. Attribution is only as good as what the traces capture; the nodes the SDK records are the path it walks when quality drops.

No. Connect Langfuse or Braintrust as an upstream source and pull a slice on demand. Langfuse and Braintrust tell you a score moved; Tessary reads that same data and tells you why, and names the change behind the move. Keep them running, and add nodes only where the traces come up short.

Yes. Agentic workflows chain several model calls, and some failures only appear in the handoff between them. The agent-native model keeps first-class tool calls and typed observations across a trace, so a grader can flag a failure that spans the chain, and attribution can walk that same chain back to the step that fed it.

The OTLP front door and SDKs work across model providers and trace stacks rather than assuming one. Shipped upstream pull connectors are Langfuse and Braintrust.

Yes. A GitHub Action reads the diff before a deploy, posts a trouble report on the pull request, and can gate the merge. The gate covers changes that ship through a PR. A provider model update, an upstream agent's shift, or a tool returning different data reintroduces a failure with no deploy at all; the production trend catches those.

Keep exploring

Get Started

Get the traffic in. The cause-finding starts there.

Push over OTLP, add an SDK, or connect Langfuse or Braintrust; starting needs your repo and no credit card. The next time a trend drops, you get the cause, not a pile of spans to read by hand.

Start with your repoBring your own provider keys. Connect the trace stack you already run.

Instrument your agent so the cause can be found

Errors are the loud failures. Traces have to hold the quiet ones.

A traced call site in five lines

Ingest, connect, detect, and trace the cause

Push traffic over OTLP

Or connect what you already have

Detect as a trend over graders

Trace a drop to the change behind it

Ingest that fits an agent in production

Two ways to land traffic

Nodes that pay for themselves

Start from your existing stack

Signals, not raw spans

Pre-deploy CI gate

Provider and stack agnostic

Questions about the SDK and ingest

Related pages

Get the traffic in. The cause-finding starts there.