Skip to main content
A generation is a single LLM call inside a trace. Typing a span as a generation tells Lemma to read its model, token usage, prompt, and completion — powering cost, latency, and model analysis. Create generations inside the trace root callback so they nest under the trace. See Traces for the root.

Record a generation

import { startActiveObservation } from "@langfuse/tracing";

const completion = await startActiveObservation(
  "draft-reply",
  async (gen) => {
    const response = await callModel(messages);

    gen.update({
      input: messages,
      output: response.text,
      model: "gpt-4o",
      usageDetails: {
        input: response.usage.inputTokens,
        output: response.usage.outputTokens,
      },
    });

    return response;
  },
  { asType: "generation" },
);

What to record

FieldWhy it matters
modelGroups and compares behavior and cost by model
usageDetails (input/output tokens)Powers token and cost analysis
inputThe prompt or messages sent to the model
outputThe completion returned by the model
If you instrument LLM calls with a supported framework (Vercel AI SDK, OpenAI Agents, LangChain, …), generation spans — including model and tokens — are produced automatically. Use manual generations when you call a model directly or your framework does not emit them.

Errors

Mark a failed model call so it surfaces in Lemma:
await startActiveObservation(
  "draft-reply",
  async (gen) => {
    try {
      const response = await callModel(messages);
      gen.update({ input: messages, output: response.text, model: "gpt-4o" });
      return response;
    } catch (error) {
      gen.update({
        level: "ERROR",
        statusMessage: error instanceof Error ? error.message : String(error),
      });
      throw error;
    }
  },
  { asType: "generation" },
);

Next steps

Tool calls

Record tool arguments and results.

Spans

Trace retrieval, ranking, and app logic.