Generations

A generation is a single LLM call inside a trace. Typing a span as a generation tells Lemma to read its model, token usage, prompt, and completion — powering cost, latency, and model analysis. Create generations inside the trace root callback so they nest under the trace. See Traces for the root.

Record a generation

TypeScript
Python

import { startActiveObservation } from "@langfuse/tracing";

const completion = await startActiveObservation(
  "draft-reply",
  async (gen) => {
    const response = await callModel(messages);

    gen.update({
      input: messages,
      output: response.text,
      model: "gpt-4o",
      usageDetails: {
        input: response.usage.inputTokens,
        output: response.usage.outputTokens,
      },
    });

    return response;
  },
  { asType: "generation" },
);

from langfuse import get_client

langfuse = get_client()

with langfuse.start_as_current_generation(name="draft-reply", model="gpt-4o") as gen:
    response = call_model(messages)
    gen.update(
        input=messages,
        output=response.text,
        usage_details={
            "input": response.usage.input_tokens,
            "output": response.usage.output_tokens,
        },
    )

What to record

Field	Why it matters
`model`	Groups and compares behavior and cost by model
`usageDetails` (input/output tokens)	Powers token and cost analysis
`input`	The prompt or messages sent to the model
`output`	The completion returned by the model

If you instrument LLM calls with a supported framework (Vercel AI SDK, OpenAI Agents, LangChain, …), generation spans — including model and tokens — are produced automatically. Use manual generations when you call a model directly or your framework does not emit them.

Errors

Mark a failed model call so it surfaces in Lemma:

TypeScript
Python

await startActiveObservation(
  "draft-reply",
  async (gen) => {
    try {
      const response = await callModel(messages);
      gen.update({ input: messages, output: response.text, model: "gpt-4o" });
      return response;
    } catch (error) {
      gen.update({
        level: "ERROR",
        statusMessage: error instanceof Error ? error.message : String(error),
      });
      throw error;
    }
  },
  { asType: "generation" },
);

with langfuse.start_as_current_generation(name="draft-reply", model="gpt-4o") as gen:
    try:
        response = call_model(messages)
        gen.update(input=messages, output=response.text)
    except Exception as error:
        gen.update(level="ERROR", status_message=str(error))
        raise

Record a generation

What to record

Errors

Next steps

Tool calls

Spans

​Record a generation

​What to record

​Errors

​Next steps

Tool calls

Spans

Record a generation

What to record

Errors

Next steps