Instrument an agent

This is the full path from an uninstrumented agent to one complete, well-shaped trace in Lemma. Everything you need is on this page; the per-primitive pages (Traces, Generations, Tool calls, Spans, Threads & context) go deeper on each piece.

Lemma is opinionated, not a generic OTLP destination. It reads a specific trace shape. Follow this walkthrough and your agent produces that shape; forward arbitrary spans and your traces will render empty or broken.

What you’ll build

One agent execution becomes one trace. A realistic support agent retrieves context, calls a tool, and asks a model — all nested under a single root:

support-agent                  ← trace root (input, output, agent name, thread, user)
├─ retrieve-context            ← span
│  └─ search_docs              ← tool call (args, result)
├─ lookup_order                ← tool call (args, result)
└─ answer                      ← generation (model, tokens, prompt, completion)

1. Install

TypeScript
Python

npm install @langfuse/tracing @langfuse/otel @opentelemetry/sdk-trace-node @opentelemetry/exporter-trace-otlp-proto

pip install "langfuse>=3,<4" opentelemetry-sdk opentelemetry-exporter-otlp

2. Configure the exporter (once, at startup)

Point Langfuse at Lemma and register it before any agent or model client runs. Late initialization is the most common reason spans go missing.

TypeScript
Python

// instrumentation.ts — imported first, before your app code
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-proto";
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";

export const lemmaProcessor = new LangfuseSpanProcessor({
  exporter: new OTLPTraceExporter({
    url: process.env.LEMMA_BASE_URL,
    headers: {
      Authorization: `Bearer ${process.env.LEMMA_API_KEY}`,
      "X-Lemma-Project-ID": process.env.LEMMA_PROJECT_ID,
    },
  }),
});

new NodeTracerProvider({ spanProcessors: [lemmaProcessor] }).register();

# instrumentation.py — imported first, before your app code
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(
        OTLPSpanExporter(
            endpoint=os.environ["LEMMA_BASE_URL"],
            headers={
                "Authorization": f"Bearer {os.environ['LEMMA_API_KEY']}",
                "X-Lemma-Project-ID": os.environ["LEMMA_PROJECT_ID"],
            },
        )
    )
)
trace.set_tracer_provider(provider)

Set the environment (find these in Lemma project settings). Lemma-only export needs no LANGFUSE_* credentials.

export LEMMA_BASE_URL="https://api.uselemma.ai/otel/v1/traces"
export LEMMA_API_KEY="lma_..."
export LEMMA_PROJECT_ID="proj_..."

3. The complete instrumented agent

Here is the whole agent in one piece. Each part is explained below.

TypeScript
Python

import {
  propagateAttributes,
  startActiveObservation,
} from "@langfuse/tracing";
import { lemmaProcessor } from "./instrumentation";

export async function handleSupportRequest(req: {
  message: string;
  conversationId: string;
  userId: string;
}): Promise<string> {
  // (1) One agent execution = one trace. Open the root span.
  return startActiveObservation("support-agent", async (root) => {
    root.update({ input: req.message });

    // (2) Trace-level context: agent name, thread, user.
    return propagateAttributes(
      {
        traceName: "support-agent",
        sessionId: req.conversationId,
        userId: req.userId,
        metadata: { "gen_ai.agent.name": "support-agent" },
      },
      async () => {
        try {
          // (3) A span groups a multi-step sub-task (retrieval).
          const docs = await startActiveObservation(
            "retrieve-context",
            async (span) => {
              span.update({ input: { query: req.message } });

              // (4) A tool call nested inside the span.
              const found = await startActiveObservation(
                "search_docs",
                async (tool) => {
                  const result = await searchDocs(req.message);
                  tool.update({ input: { query: req.message }, output: result });
                  return result;
                },
                { asType: "tool" },
              );

              span.update({ output: { count: found.length } });
              return found;
            },
          );

          // (5) Another tool call, directly under the root.
          const order = await startActiveObservation(
            "lookup_order",
            async (tool) => {
              const result = await lookupOrder(req.userId);
              tool.update({ input: { userId: req.userId }, output: result });
              return result;
            },
            { asType: "tool" },
          );

          // (6) A generation: the LLM call, with model + token usage.
          const answer = await startActiveObservation(
            "answer",
            async (gen) => {
              const messages = buildPrompt(req.message, docs, order);
              const r = await callModel(messages);
              gen.update({
                input: messages,
                output: r.text,
                model: "gpt-4o",
                usageDetails: { input: r.usage.inputTokens, output: r.usage.outputTokens },
              });
              return r.text;
            },
            { asType: "generation" },
          );

          // (7) Record the final output on the root.
          root.update({ output: answer });
          return answer;
        } catch (error) {
          // (8) Mark failures on the root so they surface in Lemma.
          root.update({
            level: "ERROR",
            statusMessage: error instanceof Error ? error.message : String(error),
          });
          throw error;
        } finally {
          // (9) Flush in short-lived / serverless runtimes.
          await lemmaProcessor.forceFlush();
        }
      },
    );
  });
}

from langfuse import get_client

langfuse = get_client()

def handle_support_request(message: str, conversation_id: str, user_id: str) -> str:
    # (1) One agent execution = one trace. Open the root span.
    with langfuse.start_as_current_span(name="support-agent") as root:
        root.update(input=message)

        # (2) Trace-level context: agent name, thread, user.
        langfuse.update_current_trace(
            name="support-agent",
            session_id=conversation_id,
            user_id=user_id,
            metadata={"gen_ai.agent.name": "support-agent"},
        )

        try:
            # (3) A span groups a multi-step sub-task (retrieval).
            with langfuse.start_as_current_span(name="retrieve-context") as span:
                span.update(input={"query": message})

                # (4) A tool call nested inside the span.
                with langfuse.start_as_current_observation(
                    name="search_docs", as_type="tool"
                ) as tool:
                    docs = search_docs(message)
                    tool.update(input={"query": message}, output=docs)

                span.update(output={"count": len(docs)})

            # (5) Another tool call, directly under the root.
            with langfuse.start_as_current_observation(
                name="lookup_order", as_type="tool"
            ) as tool:
                order = lookup_order(user_id)
                tool.update(input={"user_id": user_id}, output=order)

            # (6) A generation: the LLM call, with model + token usage.
            with langfuse.start_as_current_generation(name="answer", model="gpt-4o") as gen:
                messages = build_prompt(message, docs, order)
                r = call_model(messages)
                gen.update(
                    input=messages,
                    output=r.text,
                    usage_details={"input": r.usage.input_tokens, "output": r.usage.output_tokens},
                )
                answer = r.text

            # (7) Record the final output on the root.
            root.update(output=answer)
            return answer
        except Exception as error:
            # (8) Mark failures on the root so they surface in Lemma.
            root.update(level="ERROR", status_message=str(error))
            raise
        finally:
            # (9) Flush in short-lived / serverless runtimes.
            langfuse.flush()

How each part maps to the contract

Root span — one execution, one trace. Everything nests inside this callback. → Traces
Trace context — agent name, thread (sessionId), and user, set once for the whole trace. → Threads & context
Span — groups a sub-task so its work nests beneath it. → Spans
Tool call (nested) — a tool invoked as part of retrieval; input is the args, output is the result. → Tool calls
Tool call (top-level) — a tool directly under the root.
Generation — the LLM call, carrying model and usageDetails so Lemma can compute cost and tokens. → Generations
Output — the final answer recorded on the root.
Errors — level: "ERROR" on the failing span so failures are visible.
Flush — force a flush before a short-lived process exits. → Setup

4. Instrumenting an agent loop

Most agents loop: the model calls tools, you feed results back, repeat. The rule is unchanged — the whole loop is one trace. Open the root once, then create a generation per model turn and a tool call per tool invocation, all inside the root callback.

TypeScript
Python

await startActiveObservation("support-agent", async (root) => {
  root.update({ input: req.message });
  const messages = [{ role: "user", content: req.message }];

  while (true) {
    const turn = await startActiveObservation(
      "model-turn",
      async (gen) => {
        const r = await callModel(messages);
        gen.update({ input: messages, output: r, model: "gpt-4o", usageDetails: r.usage });
        return r;
      },
      { asType: "generation" },
    );

    if (!turn.toolCalls?.length) {
      root.update({ output: turn.text });
      return turn.text;
    }

    for (const call of turn.toolCalls) {
      const result = await startActiveObservation(
        call.name,
        async (tool) => {
          const out = await runTool(call.name, call.args);
          tool.update({ input: call.args, output: out });
          return out;
        },
        { asType: "tool" },
      );
      messages.push({ role: "tool", name: call.name, content: JSON.stringify(result) });
    }
  }
});

with langfuse.start_as_current_span(name="support-agent") as root:
    root.update(input=message)
    messages = [{"role": "user", "content": message}]

    while True:
        with langfuse.start_as_current_generation(name="model-turn", model="gpt-4o") as gen:
            turn = call_model(messages)
            gen.update(input=messages, output=turn, usage_details=turn.usage)

        if not turn.tool_calls:
            root.update(output=turn.text)
            break

        for call in turn.tool_calls:
            with langfuse.start_as_current_observation(name=call.name, as_type="tool") as tool:
                out = run_tool(call.name, call.args)
                tool.update(input=call.args, output=out)
            messages.append({"role": "tool", "name": call.name, "content": str(out)})

Already using a framework like the Vercel AI SDK, OpenAI Agents, or LangGraph? It can emit these spans for you — see Frameworks. You still wrap the run in one root span so every turn nests into a single trace.

Show Every span appears as a separate trace

If LLM or tool calls show up as their own separate traces, the work ran outside the root’s active context — usually a lost async context across a queue, worker, stream, or setTimeout. The default fix is to keep all work inside the root callback so child spans nest automatically.Keep the root open until children finish. The root must encompass all of its children. Do not let the root end — and do not flush — until every child span has completed, or those children can be orphaned into separate traces.If context cannot propagate automatically — across a queue, worker, or separate service — carry the IDs manually. Capture getActiveTraceId() and getActiveSpanId() on the parent, and attach the child with parentSpanContext:

import { getActiveSpanId, getActiveTraceId, startObservation } from "@langfuse/tracing";

const traceId = getActiveTraceId();
const parentSpanId = getActiveSpanId();

const child = startObservation(
  "search_docs",
  { input: { query } },
  { asType: "tool", parentSpanContext: { traceId, spanId: parentSpanId, traceFlags: 1 } },
);
child.update({ output: await searchDocs(query) }).end();

See Troubleshooting and Langfuse’s trace and observation IDs.

5. Verify in Lemma

Open the Lemma dashboard → Traces. Confirm:

One trace per run — the whole execution is one trace, not separate traces per call.
Root has input and output — the user message and the final answer.
Generations are nested — each LLM call shows model and token usage.
Tools are nested — each tool call shows arguments and result.
Context is set — agent name, thread, and user appear on the trace.

Go deeper

Traces

The root span, errors, and context propagation.

Generations

LLM calls with model and token usage.

Tool calls

Tool arguments, results, and a reusable wrapper.

Trace contract

The exact shape Lemma reads.

​What you’ll build

​1. Install

​2. Configure the exporter (once, at startup)

​3. The complete instrumented agent

​How each part maps to the contract

​4. Instrumenting an agent loop

​5. Verify in Lemma

​Go deeper

Traces

Generations

Tool calls

Trace contract

What you’ll build

1. Install

2. Configure the exporter (once, at startup)

3. The complete instrumented agent

How each part maps to the contract

4. Instrumenting an agent loop

5. Verify in Lemma

Go deeper