> ## Documentation Index
> Fetch the complete documentation index at: https://docs.uselemma.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Generations

> Capture LLM calls with model, tokens, prompt, and completion

A **generation** is a single LLM call inside a trace. Typing a span as a generation tells Lemma to read its model, token usage, prompt, and completion — powering cost, latency, and model analysis.

Create generations **inside** the trace root callback so they nest under the trace. See [Traces](/tracing/instrumentation/traces) for the root.

## Record a generation

<Tabs>
  <Tab title="TypeScript">
    ```typescript theme={null}
    import { startActiveObservation } from "@langfuse/tracing";

    const completion = await startActiveObservation(
      "draft-reply",
      async (gen) => {
        const response = await callModel(messages);

        gen.update({
          input: messages,
          output: response.text,
          model: "gpt-4o",
          usageDetails: {
            input: response.usage.inputTokens,
            output: response.usage.outputTokens,
          },
        });

        return response;
      },
      { asType: "generation" },
    );
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    from langfuse import get_client

    langfuse = get_client()

    with langfuse.start_as_current_generation(name="draft-reply", model="gpt-4o") as gen:
        response = call_model(messages)
        gen.update(
            input=messages,
            output=response.text,
            usage_details={
                "input": response.usage.input_tokens,
                "output": response.usage.output_tokens,
            },
        )
    ```
  </Tab>
</Tabs>

## What to record

| Field                                | Why it matters                                 |
| ------------------------------------ | ---------------------------------------------- |
| `model`                              | Groups and compares behavior and cost by model |
| `usageDetails` (input/output tokens) | Powers token and cost analysis                 |
| `input`                              | The prompt or messages sent to the model       |
| `output`                             | The completion returned by the model           |

<Note>
  If you instrument LLM calls with a [supported framework](/frameworks/vercel-ai-sdk) (Vercel AI SDK, OpenAI Agents, LangChain, …), generation spans — including model and tokens — are produced automatically. Use manual generations when you call a model directly or your framework does not emit them.
</Note>

## Errors

Mark a failed model call so it surfaces in Lemma:

<Tabs>
  <Tab title="TypeScript">
    ```typescript theme={null}
    await startActiveObservation(
      "draft-reply",
      async (gen) => {
        try {
          const response = await callModel(messages);
          gen.update({ input: messages, output: response.text, model: "gpt-4o" });
          return response;
        } catch (error) {
          gen.update({
            level: "ERROR",
            statusMessage: error instanceof Error ? error.message : String(error),
          });
          throw error;
        }
      },
      { asType: "generation" },
    );
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    with langfuse.start_as_current_generation(name="draft-reply", model="gpt-4o") as gen:
        try:
            response = call_model(messages)
            gen.update(input=messages, output=response.text)
        except Exception as error:
            gen.update(level="ERROR", status_message=str(error))
            raise
    ```
  </Tab>
</Tabs>

## Next steps

<CardGroup cols={2}>
  <Card title="Tool calls" icon="wrench" href="/tracing/instrumentation/tool-calls">
    Record tool arguments and results.
  </Card>

  <Card title="Spans" icon="box" href="/tracing/instrumentation/spans">
    Trace retrieval, ranking, and app logic.
  </Card>
</CardGroup>
