OpenAI Agents SDK

The OpenAI Agents SDK runs multi-step agents with tools and handoffs. Use its Langfuse integration to emit spans, point Langfuse at Lemma, and wrap each run in one root span so the whole execution is a single nested trace.

One agent execution = one trace. Wrap the run in a single root span so every model and tool call nests under it. See the trace contract.

OpenAI Agents traces render fully in Lemma today. Automated issue detection is being expanded to this shape — see Good trace vs bad trace for current status.

Recipe

Install

pip install openai-agents langfuse openinference-instrumentation-openai-agents opentelemetry-sdk opentelemetry-exporter-otlp

# instrumentation.py — imported first, before your app code
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(
        OTLPSpanExporter(
            endpoint=os.environ["LEMMA_BASE_URL"],
            headers={
                "Authorization": f"Bearer {os.environ['LEMMA_API_KEY']}",
                "X-Lemma-Project-ID": os.environ["LEMMA_PROJECT_ID"],
            },
        )
    )
)
trace.set_tracer_provider(provider)

Set the environment variables. Lemma-only export needs no LANGFUSE_* credentials.

export LEMMA_BASE_URL="https://api.uselemma.ai/otel/v1/traces"
export LEMMA_API_KEY="lma_..."
export LEMMA_PROJECT_ID="proj_..."

Enable Agents SDK instrumentation

Enable the OpenInference instrumentation for the Agents SDK so model calls, tool calls, and handoffs are captured as spans. Follow the Langfuse OpenAI Agents guide for details.

from openinference.instrumentation.openai_agents import OpenAIAgentsInstrumentor

OpenAIAgentsInstrumentor().instrument()

Wrap the whole run in one root span

Wrap Runner.run in a single Langfuse root span so every step of the agent nests under one trace. Record the input and the final output on the root, and set a stable agent name.

from agents import Agent, Runner
from langfuse import get_client

langfuse = get_client()

agent = Agent(name="support-agent", instructions="Help the user.", tools=tools)

async def run_support_agent(user_message: str, thread_id: str) -> str:
    with langfuse.start_as_current_span(name="support-agent") as root:
        root.update(input=user_message)
        langfuse.update_current_trace(
            name="support-agent",
            session_id=thread_id,
            metadata={"gen_ai.agent.name": "support-agent"},
        )

        result = await Runner.run(agent, user_message)

        root.update(output=result.final_output)
        return result.final_output

Every span the Agents SDK emits inside the with block becomes a child of the root, producing one nested trace:

support-agent              ← trace root (input, output)
├─ response                ← generation (model, tokens)
├─ search_docs             ← tool call (args, result)
└─ response                ← generation (final answer)

Flush before the process exits

In short-lived runtimes, flush so the whole trace ships in one batch.

from langfuse import get_client

get_client().flush()

If steps show up as their own separate traces, the agent ran outside the root’s active context. Keep Runner.run inside the start_as_current_span block. See Troubleshooting.

Verify in Lemma

Open the Lemma dashboard → Traces and confirm:

One trace per run — a full agent run is one trace, not one per model call.
Root has input and output — the root span shows the user message and the final output.
Generations are nested — each model call appears as a child generation with model and token usage.
Tools are nested — each tool invocation appears as a child tool span with arguments and result.

Next steps

Trace contract

The exact shape Lemma reads.

Setup

Wire the Langfuse → Lemma exporter.

Threads and sessions

Group multi-turn conversations with a thread id.

Good vs bad traces

What issue detection looks for, per shape.

​Recipe

​Verify in Lemma

​Next steps

Trace contract

Setup

Threads and sessions

Good vs bad traces

Recipe

Verify in Lemma

Next steps