Skip to main content
A step is one LLM request/response inside a run. In Lemma custom instrumentation, steps are child spans under wrapAgent.

Required

Create steps with tracer.startActiveSpan inside your wrapped run:
import { trace } from "@opentelemetry/api";
import { wrapAgent } from "@uselemma/tracing";

const tracer = trace.getTracer("my-agent");

const wrapped = wrapAgent("support-agent", async ({ onComplete }, input) => {
  const answer = await tracer.startActiveSpan("llm.step.generate", async (stepSpan) => {
    const response = await llmCall(input.userMessage);
    stepSpan.setAttribute("llm.prompt", input.userMessage);
    stepSpan.setAttribute("llm.response", response);
    stepSpan.end();
    return response;
  });

  onComplete(answer);
  return answer;
});

Optional step data

Add standardized attributes to support cost and quality analysis:
stepSpan.setAttribute("llm.model.requested", "gpt-4o");
stepSpan.setAttribute("llm.model.used", "gpt-4o-2024-08-06");
stepSpan.setAttribute("llm.tokens.prompt_uncached", 320);
stepSpan.setAttribute("llm.tokens.prompt_cached", 80);
stepSpan.setAttribute("llm.tokens.completion", 140);
stepSpan.setAttribute("llm.cost.usd", 0.0042);
stepSpan.setAttribute("llm.finish_reason", "stop");
If you use OpenInference instrumentation for your provider, these step-level attributes are usually emitted automatically.

Mark a step as failed

Record the error on the step span before ending it:
await tracer.startActiveSpan("llm.step.generate", async (stepSpan) => {
  try {
    return await llmCall("hello");
  } catch (error) {
    stepSpan.recordException(error as Error);
    stepSpan.setAttribute("step.status", "error");
    throw error;
  } finally {
    stepSpan.end();
  }
});

Dashboard outcome

Each step appears nested under the run so you can inspect:
  • per-call latency
  • model and token usage
  • finish reason
  • where failures occurred in the reasoning chain

Next Steps