Lemma is an opinionated sink, not a generic OpenTelemetry backend. It does not just store whatever spans you send — it reads a specific trace shape to power input/output display, model and token stats, tool visibility, threads, and automated issue detection. Spans that do not match this contract still arrive, but render as broken or empty and are skipped by issue detection. This page is the canonical contract; every other page builds on it.
The rule that everything else follows:
One agent execution = one trace. The trace has a single root span. LLM calls, tool calls, retrieval, and app logic are child spans of that root, not separate traces.
The product contract
Think in four nouns. This is the vocabulary used across the docs and the Lemma dashboard.
| Concept | What it is | Lemma primitive |
|---|
| Trace | One end-to-end agent execution, from user input to final response | Root span |
| Span | A unit of work inside the trace (retrieval, ranking, app logic) | Child span |
| Generation | A single LLM call (prompt, completion, model, tokens) | Child span, typed as a generation |
| Tool call | A single tool invocation (name, arguments, result) | Child span, typed as a tool |
A useful trace has:
- A root span with the user input and the final output (or error).
- A stable agent name so traces are groupable by workflow.
- Generation spans carrying model and token usage.
- Tool spans carrying arguments and results.
- A thread id when the execution is part of a multi-turn conversation.
How you satisfy it with Langfuse
Lemma standardizes on Langfuse as the instrumentation library. You write normal Langfuse code — Lemma reads the result. You never touch raw OpenTelemetry attributes.
import { propagateAttributes, startActiveObservation } from "@langfuse/tracing";
await startActiveObservation("support-agent", async (root) => {
root.update({ input: userMessage });
await propagateAttributes(
{
traceName: "support-agent",
sessionId: threadId, // groups multi-turn conversations
metadata: { "gen_ai.agent.name": "support-agent" },
},
async () => {
// generation: a single LLM call
const reply = await startActiveObservation(
"draft-reply",
async (gen) => {
const r = await callModel(userMessage);
gen.update({
input: r.messages,
output: r.text,
model: "gpt-4o",
usageDetails: { input: r.usage.inputTokens, output: r.usage.outputTokens },
});
return r;
},
{ asType: "generation" },
);
// tool: a single tool invocation
const docs = await startActiveObservation(
"search_docs",
async (tool) => {
const result = await searchDocs(query);
tool.update({ input: { query }, output: result });
return result;
},
{ asType: "tool" },
);
root.update({ output: reply.text });
},
);
});
Each Langfuse field maps to a part of the contract:
| Contract field | Langfuse field | Set on |
|---|
| Trace input | .update({ input }) | Root span |
| Trace output | .update({ output }) | Root span |
| Agent name | traceName + metadata["gen_ai.agent.name"] | propagateAttributes |
| Thread id | sessionId (and/or metadata["lemma.thread_id"]) | propagateAttributes |
| User id | userId | propagateAttributes |
| LLM model | .update({ model }) | Generation span |
| Token usage | .update({ usageDetails }) | Generation span |
| Prompt / completion | .update({ input, output }) | Generation span |
| Tool name | observation name | Tool span |
| Tool args / result | .update({ input, output }) | Tool span |
| Error | .update({ level: "ERROR", statusMessage }) | Any span |
You do not set OpenTelemetry attribute keys by hand. Use the Langfuse fields above; Lemma reads the exported observation. See Setup to wire the exporter, then Traces for the full pattern.
Required vs optional
| Field | Required? | Without it |
|---|
| Single root span per execution | Required | Each call becomes its own trace; no agent view |
| Root input | Required | Traces show timing only |
| Root output or error | Required | You cannot tell success from failure |
| Agent name | Recommended | Traces are hard to group and filter |
| Generation model + usage | Recommended | No cost, token, or model analysis |
| Tool name + args + result | Recommended | Tool calls are invisible or opaque |
| Thread id | Optional | Multi-turn conversations are not grouped |
| User / session / environment | Optional | No per-user or per-environment slicing |
Issue detection eligibility
Beyond rendering a trace, Lemma runs automated issue detection (silent failures, bad tool calls, loops). Today this runs for traces that arrive in a recognized shape:
- Vercel AI SDK traces (the AI SDK’s
experimental_telemetry output).
- OpenInference / LangGraph traces.
If you instrument with a supported framework, issue detection works automatically. Pure manual Langfuse traces render today and are being brought to full issue-detection parity. For the current status of each shape, see Good trace vs bad trace.
Appendix: underlying OpenTelemetry keys
You do not need this section for Langfuse instrumentation. It is here for teams exporting from an existing OpenTelemetry / OpenInference / Vercel AI SDK pipeline, documenting the literal attribute keys Lemma reads.
Trace root — the earliest span in the trace with no parent. All other spans must be its descendants.
| Field | Attribute keys Lemma reads (priority order) |
|---|
| Agent name | gen_ai.agent.name, then ai.agent.name (on an agent-run root) |
| Thread id | lemma.thread_id |
| User id | user.id, then enduser.id |
| Input | ai.agent.input → ai.prompt → ai.prompt.messages → gen_ai.prompt → OpenInference llm.input_messages.* → root input.value |
| Output | ai.response.text → ai.response.object → gen_ai.completion → OpenInference llm.output_messages.* → root output.value |
| Model | ai.model.id, gen_ai.request.model, gen_ai.response.model, llm.model_name |
| Tokens (in/out) | ai.usage.inputTokens / gen_ai.usage.input_tokens / gen_ai.usage.prompt_tokens / llm.token_count.prompt (and output equivalents) |
| Generation span | openinference.span.kind="llm", a Vercel generation ai.operationId, or span name response |
| Tool span | ai.toolCall.* (Vercel), or openinference.span.kind="tool" + tool.name |
| Tool args / result | ai.toolCall.args/ai.toolCall.input and ai.toolCall.result/ai.toolCall.output, else input.value/output.value |
If any child references a parent span that is not in the same export batch, the trace can be dropped. In short-lived or serverless runtimes, flush before the process exits so the whole trace ships together. See Setup.