Musterby Elitery
Integrations

OpenAI SDK (Python)

Drop-in replacement for the OpenAI Python SDK that automatically captures prompts, completions, latencies, errors, and token usage in Muster.

This integration provides a drop-in replacement for the OpenAI Python SDK that enables full observability through Muster. By changing only the import statement, you gain automatic tracking of prompts, completions, latencies, errors, and token usage.

Key Features

The wrapper automatically captures:

  • All prompts/completions with support for streaming, async, and functions
  • Latencies and API errors
  • Model usage (tokens) and cost in USD

Installation & Setup

Requirements

The integration requires OpenAI SDK version >=0.27.8, with async and streaming support available in >=1.0.0.

pip install langfuse openai

Configuration

Option 1: Environment Variables

LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_PUBLIC_KEY="pk-lf-..."
LANGFUSE_BASE_URL="https://app.getmuster.io"   # or your self-hosted Muster URL

Option 2: Direct Attributes

import openai

openai.langfuse_public_key = "pk-lf-..."
openai.langfuse_secret_key = "sk-lf-..."
openai.langfuse_enabled = True

Basic Usage

Simply change the import:

from langfuse.openai import openai

completion = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story about a dog."}],
)

Alternative imports are available for OpenAI, AsyncOpenAI, AzureOpenAI, and AsyncAzureOpenAI classes.

Advanced Features

Custom Trace Properties

You can add name, metadata, trace_id, and parent_observation_id to OpenAI method calls. Trace attributes like session_id, user_id, and tags can be set via metadata or enclosing spans.

Structured Output Support

For SDK versions >=1.92.0:

completion = openai.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[...],
    response_format=CalendarEvent,
    name="extract-calendar-event"
)

Streaming with Token Usage

When using streaming responses with include_usage=True, OpenAI returns token usage information in a final chunk that has an empty choices list.

stream = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count to ten."}],
    stream=True,
    stream_options={"include_usage": True},
)
for chunk in stream:
    print(chunk)

Configuration & Troubleshooting

Flushing Events

For short-lived applications, flush events before exit:

from langfuse import get_client

langfuse = get_client()
langfuse.flush()

Debug Mode

from langfuse import Langfuse

langfuse = Langfuse(debug=True)
# OR via env var:
# export LANGFUSE_DEBUG=true

Sampling

Control trace volume with sampling configuration:

from langfuse import Langfuse

langfuse = Langfuse(sample_rate=0.1)
# OR:
# export LANGFUSE_SAMPLE_RATE=0.1

Disable Tracing

from langfuse import Langfuse

langfuse = Langfuse(tracing_enabled=False)
# OR:
# export LANGFUSE_TRACING_ENABLED=false

Limitations

The integration does not support the OpenAI Assistants API due to server-side state constraints. A separate notebook is available demonstrating manual tracking approaches for this use case — see the upstream guide.

See also