Vercel AI SDK

The Vercel AI SDK emits OpenTelemetry spans through experimental_telemetry. Those spans carry the exact attributes Lemma reads, so once you point Langfuse at Lemma and wrap each agent run in one root span, you get a complete nested trace: a root with input/output, generations for every model call, and tool spans for every tool.

One agent execution = one trace. Wrap a full multi-step run in a single root span so every model and tool call nests under it. See the trace contract.

Vercel AI SDK telemetry is a fully supported shape for Lemma’s automated issue detection (silent failures, bad tool calls, loops) today — no extra work beyond enabling experimental_telemetry.

Recipe

Install

npm install ai @langfuse/tracing @langfuse/otel @opentelemetry/sdk-trace-node @opentelemetry/exporter-trace-otlp-proto

// instrumentation.ts — imported first, before your app code
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-proto";
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";

export const lemmaProcessor = new LangfuseSpanProcessor({
  exporter: new OTLPTraceExporter({
    url: process.env.LEMMA_BASE_URL,
    headers: {
      Authorization: `Bearer ${process.env.LEMMA_API_KEY}`,
      "X-Lemma-Project-ID": process.env.LEMMA_PROJECT_ID,
    },
  }),
});

new NodeTracerProvider({ spanProcessors: [lemmaProcessor] }).register();

Set the environment variables. Lemma-only export needs no LANGFUSE_* credentials.

export LEMMA_BASE_URL="https://api.uselemma.ai/otel/v1/traces"
export LEMMA_API_KEY="lma_..."
export LEMMA_PROJECT_ID="proj_..."

Enable AI SDK telemetry

Turn on experimental_telemetry for every generateText / streamText / generateObject call. The AI SDK natively emits the model, token-usage, and tool-call attributes Lemma reads. Set gen_ai.agent.name so traces group by workflow.

import { generateText } from "ai";

const result = await generateText({
  model: "openai/gpt-4o",
  messages,
  tools,
  experimental_telemetry: {
    isEnabled: true,
    functionId: "support-agent",
    metadata: {
      "gen_ai.agent.name": "support-agent",
      "lemma.thread_id": threadId,
    },
  },
});

By semantic convention, use snake_case, CamelCase, or kebab-case for gen_ai.agent.name (for example support_agent, SupportAgent, or support-agent).

Wrap the whole run in one root span

A multi-step agent makes several model and tool calls. Wrap the entire run in a single root span so all of them nest under one trace instead of arriving as separate traces.

import { propagateAttributes, startActiveObservation } from "@langfuse/tracing";
import { generateText, stepCountIs } from "ai";

export async function runSupportAgent(userMessage: string, threadId: string) {
  return await startActiveObservation("support-agent", async (root) => {
    root.update({ input: userMessage });

    return await propagateAttributes(
      {
        traceName: "support-agent",
        sessionId: threadId,
        metadata: { "gen_ai.agent.name": "support-agent" },
      },
      async () => {
        const result = await generateText({
          model: "openai/gpt-4o",
          messages: [{ role: "user", content: userMessage }],
          tools,
          stopWhen: stepCountIs(8),
          experimental_telemetry: {
            isEnabled: true,
            functionId: "support-agent",
            metadata: {
              "gen_ai.agent.name": "support-agent",
              "lemma.thread_id": threadId,
            },
          },
        });

        root.update({ output: result.text });
        return result.text;
      },
    );
  });
}

The AI SDK spans created inside the callback automatically become children of the root, producing one nested trace:

support-agent              ← trace root (input, output)
├─ generateText            ← generation (model, tokens)
├─ search_docs             ← tool call (args, result)
└─ generateText            ← generation (final answer)

Flush before the process exits

In serverless or other short-lived runtimes, flush so the whole trace ships in one batch.

import { lemmaProcessor } from "./instrumentation";

// at the end of a request / serverless handler
await lemmaProcessor.forceFlush();

If model or tool calls show up as their own separate traces, the work ran outside the root’s active context — usually a lost async context across a queue, worker, or stream. Keep the generateText call inside the startActiveObservation callback. See Troubleshooting.

Verify in Lemma

Open the Lemma dashboard → Traces and confirm:

One trace per run — a single multi-step execution is one trace, not one per model call.
Root has input and output — the root span shows the user message and the final response.
Generations are nested — each model call appears as a child generation with model and token usage.
Tools are nested — each tool invocation appears as a child tool span with arguments and result.

Next steps

Trace contract

The exact shape Lemma reads.

Setup

Wire the Langfuse → Lemma exporter.

Threads and sessions

Group multi-turn conversations with a thread id.

Good vs bad traces

What issue detection looks for, per shape.

​Recipe

​Verify in Lemma

​Next steps

Trace contract

Setup

Threads and sessions

Good vs bad traces

Recipe

Verify in Lemma

Next steps