Trace contract

Lemma is an opinionated sink, not a generic OpenTelemetry backend. It does not just store whatever spans you send — it reads a specific trace shape to power input/output display, model and token stats, tool visibility, threads, and automated issue detection. Spans that do not match this contract still arrive, but render as broken or empty and are skipped by issue detection. This page is the canonical contract; every other page builds on it. The rule that everything else follows:

One agent execution = one trace. The trace has a single root span. LLM calls, tool calls, retrieval, and app logic are child spans of that root, not separate traces.

The product contract

Think in four nouns. This is the vocabulary used across the docs and the Lemma dashboard.

Concept	What it is	Lemma primitive
Trace	One end-to-end agent execution, from user input to final response	Root span
Span	A unit of work inside the trace (retrieval, ranking, app logic)	Child span
Generation	A single LLM call (prompt, completion, model, tokens)	Child span, typed as a generation
Tool call	A single tool invocation (name, arguments, result)	Child span, typed as a tool

A useful trace has:

A root span with the user input and the final output (or error).
A stable agent name so traces are groupable by workflow.
Generation spans carrying model and token usage.
Tool spans carrying arguments and results.
A thread id when the execution is part of a multi-turn conversation.

How you satisfy it with Langfuse

Lemma standardizes on Langfuse as the instrumentation library. You write normal Langfuse code — Lemma reads the result. You never touch raw OpenTelemetry attributes.

import { propagateAttributes, startActiveObservation } from "@langfuse/tracing";

await startActiveObservation("support-agent", async (root) => {
  root.update({ input: userMessage });

  await propagateAttributes(
    {
      traceName: "support-agent",
      sessionId: threadId,                       // groups multi-turn conversations
      metadata: { "gen_ai.agent.name": "support-agent" },
    },
    async () => {
      // generation: a single LLM call
      const reply = await startActiveObservation(
        "draft-reply",
        async (gen) => {
          const r = await callModel(userMessage);
          gen.update({
            input: r.messages,
            output: r.text,
            model: "gpt-4o",
            usageDetails: { input: r.usage.inputTokens, output: r.usage.outputTokens },
          });
          return r;
        },
        { asType: "generation" },
      );

      // tool: a single tool invocation
      const docs = await startActiveObservation(
        "search_docs",
        async (tool) => {
          const result = await searchDocs(query);
          tool.update({ input: { query }, output: result });
          return result;
        },
        { asType: "tool" },
      );

      root.update({ output: reply.text });
    },
  );
});

Each Langfuse field maps to a part of the contract:

Contract field	Langfuse field	Set on
Trace input	`.update({ input })`	Root span
Trace output	`.update({ output })`	Root span
Agent name	`traceName` + `metadata["gen_ai.agent.name"]`	`propagateAttributes`
Thread id	`sessionId` (and/or `metadata["lemma.thread_id"]`)	`propagateAttributes`
User id	`userId`	`propagateAttributes`
LLM model	`.update({ model })`	Generation span
Token usage	`.update({ usageDetails })`	Generation span
Prompt / completion	`.update({ input, output })`	Generation span
Tool name	observation `name`	Tool span
Tool args / result	`.update({ input, output })`	Tool span
Error	`.update({ level: "ERROR", statusMessage })`	Any span

You do not set OpenTelemetry attribute keys by hand. Use the Langfuse fields above; Lemma reads the exported observation. See Setup to wire the exporter, then Traces for the full pattern.

Required vs optional

Field	Required?	Without it
Single root span per execution	Required	Each call becomes its own trace; no agent view
Root input	Required	Traces show timing only
Root output or error	Required	You cannot tell success from failure
Agent name	Recommended	Traces are hard to group and filter
Generation model + usage	Recommended	No cost, token, or model analysis
Tool name + args + result	Recommended	Tool calls are invisible or opaque
Thread id	Optional	Multi-turn conversations are not grouped
User / session / environment	Optional	No per-user or per-environment slicing

Issue detection eligibility

Beyond rendering a trace, Lemma runs automated issue detection (silent failures, bad tool calls, loops). Today this runs for traces that arrive in a recognized shape:

Vercel AI SDK traces (the AI SDK’s experimental_telemetry output).
OpenInference / LangGraph traces.

If you instrument with a supported framework, issue detection works automatically. Pure manual Langfuse traces render today and are being brought to full issue-detection parity. For the current status of each shape, see Good trace vs bad trace.

Appendix: underlying OpenTelemetry keys

You do not need this section for Langfuse instrumentation. It is here for teams exporting from an existing OpenTelemetry / OpenInference / Vercel AI SDK pipeline, documenting the literal attribute keys Lemma reads. Trace root — the earliest span in the trace with no parent. All other spans must be its descendants.

Field	Attribute keys Lemma reads (priority order)
Agent name	`gen_ai.agent.name`, then `ai.agent.name` (on an agent-run root)
Thread id	`lemma.thread_id`
User id	`user.id`, then `enduser.id`
Input	`ai.agent.input` → `ai.prompt` → `ai.prompt.messages` → `gen_ai.prompt` → OpenInference `llm.input_messages.*` → root `input.value`
Output	`ai.response.text` → `ai.response.object` → `gen_ai.completion` → OpenInference `llm.output_messages.*` → root `output.value`
Model	`ai.model.id`, `gen_ai.request.model`, `gen_ai.response.model`, `llm.model_name`
Tokens (in/out)	`ai.usage.inputTokens` / `gen_ai.usage.input_tokens` / `gen_ai.usage.prompt_tokens` / `llm.token_count.prompt` (and output equivalents)
Generation span	`openinference.span.kind="llm"`, a Vercel generation `ai.operationId`, or span name `response`
Tool span	`ai.toolCall.*` (Vercel), or `openinference.span.kind="tool"` + `tool.name`
Tool args / result	`ai.toolCall.args`/`ai.toolCall.input` and `ai.toolCall.result`/`ai.toolCall.output`, else `input.value`/`output.value`

If any child references a parent span that is not in the same export batch, the trace can be dropped. In short-lived or serverless runtimes, flush before the process exits so the whole trace ships together. See Setup.

​The product contract

​How you satisfy it with Langfuse

​Required vs optional

​Issue detection eligibility

​Appendix: underlying OpenTelemetry keys

The product contract

How you satisfy it with Langfuse

Required vs optional

Issue detection eligibility

Appendix: underlying OpenTelemetry keys