Integrations
vLLM (self-hosted inference)
Trace vLLM-served local models in Muster via vLLM's built-in OpenTelemetry exporter.
vLLM is a high-throughput LLM inference engine. It ships with built-in OpenTelemetry support — point its OTLP exporter at Muster's /api/public/otel/v1/traces endpoint and every generation is captured.
Setup
%pip install vllm langfuse -qimport os
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_BASE_URL"] = "https://app.getmuster.io"
os.environ["OTEL_EXPORTER_OTLP_TRACES_PROTOCOL"] = "http/protobuf"
os.environ["OTEL_SERVICE_NAME"] = "vllm"Initialize vLLM with OpenTelemetry
from vllm import LLM, SamplingParams
langfuse_host = os.environ["LANGFUSE_BASE_URL"]
otlp_traces_endpoint = f"{langfuse_host}/api/public/otel/v1/traces"
llm = LLM(
model="facebook/opt-125m",
otlp_traces_endpoint=otlp_traces_endpoint,
disable_log_stats=False,
)Initialize the Muster client
from langfuse import get_client
langfuse = get_client()
if langfuse.auth_check():
print("Muster client is authenticated and ready!")Generate text
out = llm.generate(
["Write one sentence about Berlin."],
SamplingParams(max_tokens=32),
)
print(out[0].outputs[0].text)The OTLP endpoint requires Basic Auth. Configure
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64(pk:sk)>"on the process environment if vLLM does not pick up theLANGFUSE_*keys directly.