Skip to main content
Use this when your agent pipeline makes multiple LLM calls in sequence — for example, an intent extraction step followed by a generation step. Each call gets its own child span. If you register the matching OpenInference instrumentor for your provider, each SDK call automatically produces its own child span with prompt, response, model, and token data. See Provider instrumentation.
import OpenAI from "openai";
import { registerOTel, agent } from "@uselemma/tracing";
import { registerInstrumentations } from "@opentelemetry/instrumentation";
import { OpenAIInstrumentation } from "@arizeai/openinference-instrumentation-openai";

const provider = registerOTel();
registerInstrumentations({
  instrumentations: [new OpenAIInstrumentation()],
  tracerProvider: provider,
});

const client = new OpenAI();

const wrapped = agent("multi-step-agent", async (input: { userMessage: string }) => {
  const userMessage = input.userMessage;

  const intentResp = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "Extract the user's intent in one sentence." },
      { role: "user", content: userMessage },
    ],
  });
  const intent = intentResp.choices[0].message.content ?? "";

  const finalResp = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: `User intent: ${intent}. Respond helpfully.` },
      { role: "user", content: userMessage },
    ],
  });
  const result = finalResp.choices[0].message.content ?? "";

  return result;
});

const { result, runId } = await wrapped({ userMessage: "Help me plan a trip to Tokyo" });
The trace in Lemma shows:
  • One ai.agent.run root span with the user message as input and the final answer as output.
  • Two child spans — one per create call — each with its own prompt, response, model, and token counts.
This recipe is one run with several LLM steps inside it. To link multiple runs (e.g. each user message is a new invocation of wrapped), pass the same threadId / thread_id at call time. Without provider instrumentation, wrap each LLM call in a manual step span instead. See Manual instrumentation.