Skip to main content
Lemma is an opinionated sink, not a generic OpenTelemetry backend. It does not just store whatever spans you send — it reads a specific trace shape to power input/output display, model and token stats, tool visibility, threads, and automated issue detection. Spans that do not match this contract still arrive, but render as broken or empty and are skipped by issue detection. This page is the canonical contract; every other page builds on it. The rule that everything else follows:
One agent execution = one trace. The trace has a single root span. LLM calls, tool calls, retrieval, and app logic are child spans of that root, not separate traces.

The product contract

Think in four nouns. This is the vocabulary used across the docs and the Lemma dashboard.
ConceptWhat it isLemma primitive
TraceOne end-to-end agent execution, from user input to final responseRoot span
SpanA unit of work inside the trace (retrieval, ranking, app logic)Child span
GenerationA single LLM call (prompt, completion, model, tokens)Child span, typed as a generation
Tool callA single tool invocation (name, arguments, result)Child span, typed as a tool
A useful trace has:
  • A root span with the user input and the final output (or error).
  • A stable agent name so traces are groupable by workflow.
  • Generation spans carrying model and token usage.
  • Tool spans carrying arguments and results.
  • A thread id when the execution is part of a multi-turn conversation.

How you satisfy it with Langfuse

Lemma standardizes on Langfuse as the instrumentation library. You write normal Langfuse code — Lemma reads the result. You never touch raw OpenTelemetry attributes.
import { propagateAttributes, startActiveObservation } from "@langfuse/tracing";

await startActiveObservation("support-agent", async (root) => {
  root.update({ input: userMessage });

  await propagateAttributes(
    {
      traceName: "support-agent",
      sessionId: threadId,                       // groups multi-turn conversations
      metadata: { "gen_ai.agent.name": "support-agent" },
    },
    async () => {
      // generation: a single LLM call
      const reply = await startActiveObservation(
        "draft-reply",
        async (gen) => {
          const r = await callModel(userMessage);
          gen.update({
            input: r.messages,
            output: r.text,
            model: "gpt-4o",
            usageDetails: { input: r.usage.inputTokens, output: r.usage.outputTokens },
          });
          return r;
        },
        { asType: "generation" },
      );

      // tool: a single tool invocation
      const docs = await startActiveObservation(
        "search_docs",
        async (tool) => {
          const result = await searchDocs(query);
          tool.update({ input: { query }, output: result });
          return result;
        },
        { asType: "tool" },
      );

      root.update({ output: reply.text });
    },
  );
});
Each Langfuse field maps to a part of the contract:
Contract fieldLangfuse fieldSet on
Trace input.update({ input })Root span
Trace output.update({ output })Root span
Agent nametraceName + metadata["gen_ai.agent.name"]propagateAttributes
Thread idsessionId (and/or metadata["lemma.thread_id"])propagateAttributes
User iduserIdpropagateAttributes
LLM model.update({ model })Generation span
Token usage.update({ usageDetails })Generation span
Prompt / completion.update({ input, output })Generation span
Tool nameobservation nameTool span
Tool args / result.update({ input, output })Tool span
Error.update({ level: "ERROR", statusMessage })Any span
You do not set OpenTelemetry attribute keys by hand. Use the Langfuse fields above; Lemma reads the exported observation. See Setup to wire the exporter, then Traces for the full pattern.

Required vs optional

FieldRequired?Without it
Single root span per executionRequiredEach call becomes its own trace; no agent view
Root inputRequiredTraces show timing only
Root output or errorRequiredYou cannot tell success from failure
Agent nameRecommendedTraces are hard to group and filter
Generation model + usageRecommendedNo cost, token, or model analysis
Tool name + args + resultRecommendedTool calls are invisible or opaque
Thread idOptionalMulti-turn conversations are not grouped
User / session / environmentOptionalNo per-user or per-environment slicing

Issue detection eligibility

Beyond rendering a trace, Lemma runs automated issue detection (silent failures, bad tool calls, loops). Today this runs for traces that arrive in a recognized shape:
  • Vercel AI SDK traces (the AI SDK’s experimental_telemetry output).
  • OpenInference / LangGraph traces.
If you instrument with a supported framework, issue detection works automatically. Pure manual Langfuse traces render today and are being brought to full issue-detection parity. For the current status of each shape, see Good trace vs bad trace.

Appendix: underlying OpenTelemetry keys

You do not need this section for Langfuse instrumentation. It is here for teams exporting from an existing OpenTelemetry / OpenInference / Vercel AI SDK pipeline, documenting the literal attribute keys Lemma reads. Trace root — the earliest span in the trace with no parent. All other spans must be its descendants.
FieldAttribute keys Lemma reads (priority order)
Agent namegen_ai.agent.name, then ai.agent.name (on an agent-run root)
Thread idlemma.thread_id
User iduser.id, then enduser.id
Inputai.agent.inputai.promptai.prompt.messagesgen_ai.prompt → OpenInference llm.input_messages.* → root input.value
Outputai.response.textai.response.objectgen_ai.completion → OpenInference llm.output_messages.* → root output.value
Modelai.model.id, gen_ai.request.model, gen_ai.response.model, llm.model_name
Tokens (in/out)ai.usage.inputTokens / gen_ai.usage.input_tokens / gen_ai.usage.prompt_tokens / llm.token_count.prompt (and output equivalents)
Generation spanopeninference.span.kind="llm", a Vercel generation ai.operationId, or span name response
Tool spanai.toolCall.* (Vercel), or openinference.span.kind="tool" + tool.name
Tool args / resultai.toolCall.args/ai.toolCall.input and ai.toolCall.result/ai.toolCall.output, else input.value/output.value
If any child references a parent span that is not in the same export batch, the trace can be dropped. In short-lived or serverless runtimes, flush before the process exits so the whole trace ships together. See Setup.