Skip to main content
Use this when your agent streams response chunks back to a caller in real time while keeping the span open for the full duration of the stream. Pass { streaming: true } (TypeScript) or streaming=True (Python) to tell the wrapper not to auto-close the span when the function returns. Then call ctx.complete(output) once the full output is assembled.
The cleanest pattern with the Vercel AI SDK is to call ctx.complete() inside onFinish and return the data stream response:
import { registerOTel, agent } from "@uselemma/tracing";
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

registerOTel();

const wrapped = agent("my-agent", async (input: string, ctx) => {
  const result = await streamText({
    model: openai("gpt-4o"),
    prompt: input,
    experimental_telemetry: { isEnabled: true },
    onFinish({ text }) {
      ctx.complete(text); // closes the span with the full assembled output
    },
  });
  return result.toDataStreamResponse();
}, { streaming: true });

export function handleRequest(userMessage: string) {
  const { result } = await wrapped(userMessage);
  return result; // a Response that streams chunks to the client
}
Key points:
  • Pass { streaming: true } / streaming=True so the wrapper knows not to auto-close when the function returns.
  • Call ctx.complete(output) with the assembled output once the stream ends.
  • Wait for the wrapped invocation to finish before relying on runId or assuming the span is closed.
  • Replace the queue or ReadableStream example with whatever your framework uses to write SSE events.