OpenAI Agents SDK

Trace the OpenAI Agents SDK in Muster — multi-agent handoffs, tool calls, and grouped runs.

This guide demonstrates how to integrate Muster with your OpenAI Agents workflow to monitor, debug, and evaluate your AI agents.

What is the OpenAI Agents SDK? The OpenAI Agents SDK is "a lightweight, open-source framework that lets developers build AI agents and orchestrate multi-agent workflows."

What is Muster? Muster is an open-source observability platform for AI agents, built on top of the Langfuse core.

1. Install Dependencies

%pip install openai-agents langfuse nest_asyncio openinference-instrumentation-openai-agents

2. Configure Environment & Credentials

Set up your Muster API keys (from your project's Settings → API Keys) and your OpenAI key.

import os

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_BASE_URL"] = "https://app.getmuster.io"  # or your self-hosted URL

# Your OpenAI key
os.environ["OPENAI_API_KEY"] = "sk-proj-..."

3. Instrumenting the Agent

import nest_asyncio
nest_asyncio.apply()

Initialize the OpenInference OpenAI Agents instrumentation:

from openinference.instrumentation.openai_agents import OpenAIAgentsInstrumentor

OpenAIAgentsInstrumentor().instrument()

Initialize the SDK client:

from langfuse import get_client

langfuse = get_client()

# Verify connection
if langfuse.auth_check():
    print("Muster client is authenticated and ready!")
else:
    print("Authentication failed. Please check your credentials and host.")

4. Hello World Example

Create an OpenAI Agent that responds in haiku form:

import asyncio
from agents import Agent, Runner

async def main():
    agent = Agent(
        name="Assistant",
        instructions="You only respond in haikus.",
    )

    result = await Runner.run(agent, "Tell me about recursion in programming.")
    print(result.final_output)

loop = asyncio.get_running_loop()
await loop.create_task(main())

Example trace in Muster

View all sub-spans, token usage, latencies, etc., for debugging or optimization.

5. Multi-agent Handoff Example

Create multiple agents with handoff capabilities:

from agents import Agent, Runner
import asyncio

spanish_agent = Agent(
    name="Spanish agent",
    instructions="You only speak Spanish.",
)

english_agent = Agent(
    name="English agent",
    instructions="You only speak English",
)

triage_agent = Agent(
    name="Triage agent",
    instructions="Handoff to the appropriate agent based on the language of the request.",
    handoffs=[spanish_agent, english_agent],
)

result = await Runner.run(triage_agent, input="Hola, ¿cómo estás?")
print(result.final_output)

Multi-agent handoff trace

6. Functions Example

The OpenAI Agents SDK allows agents to call Python functions. With Muster instrumentation, you can see which functions are called, their arguments, and return values.

import asyncio
from agents import Agent, Runner, function_tool

# Example function tool.
@function_tool
def get_weather(city: str) -> str:
    return f"The weather in {city} is sunny."

agent = Agent(
    name="Hello world",
    instructions="You are a helpful agent.",
    tools=[get_weather],
)

async def main():
    result = await Runner.run(agent, input="What's the weather in Tokyo?")
    print(result.final_output)

loop = asyncio.get_running_loop()
await loop.create_task(main())

Function-call trace

7. Grouping Agent Runs

Group multiple calls into a single trace using the trace(...) context manager:

from agents import Agent, Runner, trace
import asyncio

async def main():
    agent = Agent(name="Joke generator", instructions="Tell funny jokes.")

    with trace("Joke workflow"):
        first_result = await Runner.run(agent, "Tell me a joke")
        second_result = await Runner.run(agent, f"Rate this joke: {first_result.final_output}")
        print(f"Joke: {first_result.final_output}")
        print(f"Rating: {second_result.final_output}")

loop = asyncio.get_running_loop()
await loop.create_task(main())

Grouped runs trace

Link Muster Prompts

If you manage your prompts with Muster Prompt Management, link the prompt used to the trace by setting up an OTel span processor.

Limitation: This method links the prompt to all generation spans in the trace whose name starts with the defined string.

from contextvars import ContextVar
from typing import Optional
from opentelemetry import context as context_api, trace
from opentelemetry.sdk.trace.export import Span, SpanProcessor

prompt_info_var = ContextVar("prompt_info", default=None)

class LangfuseProcessor(SpanProcessor):
    def on_start(self, span: "Span", parent_context: Optional[context_api.Context] = None) -> None:
        if span.name.startswith("response"):
            prompt_info = prompt_info_var.get()
            if prompt_info:
                span.set_attribute("langfuse.prompt.name", prompt_info.get("name"))
                span.set_attribute("langfuse.prompt.version", prompt_info.get("version"))


from langfuse import get_client
langfuse = get_client()

trace.get_tracer_provider().add_span_processor(LangfuseProcessor())

from openinference.instrumentation.openai_agents import OpenAIAgentsInstrumentor

OpenAIAgentsInstrumentor().instrument()

# Fetch the prompt from Muster Prompt Management
langfuse_prompt = langfuse.get_prompt("movie-critic")
system_prompt = langfuse_prompt.compile(criticlevel="expert", movie="Dune 2")

# Pass the prompt to the SpanProcessor
prompt_info_var.set({
    "name": langfuse_prompt.name,
    "version": langfuse_prompt.version,
})

# Run the agent ...

Interoperability with the Python SDK

Use this integration together with the Muster (langfuse) SDK to add additional attributes to the observation.

Decorator

The @observe() decorator automatically wraps your instrumented code:

from langfuse import observe, propagate_attributes, get_client

langfuse = get_client()

@observe()
def my_llm_pipeline(input):
    with propagate_attributes(
        user_id="user_123",
        session_id="session_abc",
        tags=["agent", "my-observation"],
        metadata={"email": "user@example.com"},
        version="1.0.0"
    ):
        result = call_llm(input)
        return result

my_llm_pipeline("Hi")

Context Manager

from langfuse import get_client, propagate_attributes

langfuse = get_client()

with langfuse.start_as_current_observation(
    as_type="span",
    name="my-observation",
    trace_context={"trace_id": "abcdef1234567890abcdef1234567890"},
) as observation:
    with propagate_attributes(
        user_id="user_123",
        session_id="session_abc",
        metadata={"experiment": "variant_a", "env": "prod"},
        version="1.0",
    ):
        result = call_llm("some input")

langfuse.flush()

Troubleshooting

No observations appearing

Enable debug mode in the Python SDK:

export LANGFUSE_DEBUG="True"

Then check the debug logs:

OTel observations appear in the logs but not in Muster — call langfuse.flush() at the end of your application; verify your API keys and base URL.
No OTel spans in the logs — instrumentation is not running before your application code.

Unwanted observations

Other libraries may emit OTel spans that are not relevant. They count toward billable units, so filter them out via OTel span processors.

Missing attributes

Some attributes are stored in metadata rather than mapped to the data model. If a mapping does not work as expected, raise an issue with your operator.