Multi-step agent

Use this when your agent pipeline makes multiple LLM calls in sequence — for example, an intent extraction step followed by a generation step. Each call gets its own child span. If you register the matching OpenInference instrumentor for your provider, each SDK call automatically produces its own child span with prompt, response, model, and token data. See Provider instrumentation.

TypeScript
Python

import OpenAI from "openai";
import { registerOTel, agent } from "@uselemma/tracing";
import { registerInstrumentations } from "@opentelemetry/instrumentation";
import { OpenAIInstrumentation } from "@arizeai/openinference-instrumentation-openai";

const provider = registerOTel();
registerInstrumentations({
  instrumentations: [new OpenAIInstrumentation()],
  tracerProvider: provider,
});

const client = new OpenAI();

const wrapped = agent("multi-step-agent", async (input: { userMessage: string }) => {
  const userMessage = input.userMessage;

  const intentResp = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "Extract the user's intent in one sentence." },
      { role: "user", content: userMessage },
    ],
  });
  const intent = intentResp.choices[0].message.content ?? "";

  const finalResp = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: `User intent: ${intent}. Respond helpfully.` },
      { role: "user", content: userMessage },
    ],
  });
  const result = finalResp.choices[0].message.content ?? "";

  return result;
});

const { result, runId } = await wrapped({ userMessage: "Help me plan a trip to Tokyo" });

from openai import AsyncOpenAI
from uselemma_tracing import register_otel, TraceContext, agent
from openinference.instrumentation.openai import OpenAIInstrumentor

register_otel()
OpenAIInstrumentor().instrument()

client = AsyncOpenAI()


async def run_agent(input: dict, ctx: TraceContext) -> str:
    user_message = input["user_message"]

    # Step 1: extract intent — produces one child span
    intent_resp = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Extract the user's intent in one sentence."},
            {"role": "user", "content": user_message},
        ],
    )
    intent = intent_resp.choices[0].message.content

    # Step 2: generate response — produces a second child span
    final_resp = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"User intent: {intent}. Respond helpfully."},
            {"role": "user", "content": user_message},
        ],
    )
    result = final_resp.choices[0].message.content

    return result


wrapped = agent("multi-step-agent", run_agent)

result, run_id, _ = await wrapped({"user_message": "Help me plan a trip to Tokyo"})

The trace in Lemma shows:

One ai.agent.run root span with the user message as input and the final answer as output.
Two child spans — one per create call — each with its own prompt, response, model, and token counts.

This recipe is one run with several LLM steps inside it. To link multiple runs (e.g. each user message is a new invocation of wrapped), pass the same threadId / thread_id at call time. Without provider instrumentation, wrap each LLM call in a manual step span instead. See Manual instrumentation.

Recipes