vLLM (self-hosted inference)

Trace vLLM-served local models in Muster via vLLM's built-in OpenTelemetry exporter.

vLLM is a high-throughput LLM inference engine. It ships with built-in OpenTelemetry support — point its OTLP exporter at Muster's /api/public/otel/v1/traces endpoint and every generation is captured.

Setup

%pip install vllm langfuse -q

import os

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_BASE_URL"] = "https://app.getmuster.io"

os.environ["OTEL_EXPORTER_OTLP_TRACES_PROTOCOL"] = "http/protobuf"
os.environ["OTEL_SERVICE_NAME"] = "vllm"

Initialize vLLM with OpenTelemetry

from vllm import LLM, SamplingParams

langfuse_host = os.environ["LANGFUSE_BASE_URL"]
otlp_traces_endpoint = f"{langfuse_host}/api/public/otel/v1/traces"

llm = LLM(
    model="facebook/opt-125m",
    otlp_traces_endpoint=otlp_traces_endpoint,
    disable_log_stats=False,
)

Initialize the Muster client

from langfuse import get_client

langfuse = get_client()

if langfuse.auth_check():
    print("Muster client is authenticated and ready!")

Generate text

out = llm.generate(
    ["Write one sentence about Berlin."],
    SamplingParams(max_tokens=32),
)
print(out[0].outputs[0].text)

The OTLP endpoint requires Basic Auth. Configure OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64(pk:sk)>" on the process environment if vLLM does not pick up the LANGFUSE_* keys directly.

vLLM (self-hosted inference)

Setup

Initialize vLLM with OpenTelemetry

Initialize the Muster client

Generate text

See also

On this page