Hugging Face Inference

Hugging Face Inference exposes hosted models — Llama, Mistral, Qwen, Falcon, and many others — via an OpenAI-compatible endpoint. Trace them in Muster by pointing the Muster OpenAI wrapper at the HF endpoint.

Setup

%pip install langfuse openai --upgrade

import os

os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_BASE_URL"] = "https://app.getmuster.io"

os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_..."

from langfuse.openai import OpenAI
from langfuse import observe

client = OpenAI(
    base_url="https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct/v1/",
    api_key=os.environ["HUGGINGFACE_ACCESS_TOKEN"],
)

Examples

Chat completion

completion = client.chat.completions.create(
    model="model-name",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a poem about language models"},
    ],
)
print(completion.choices[0].message.content)

Wrap the call with `@observe()`

@observe()
def generate_rap():
    completion = client.chat.completions.create(
        name="rap-generator",
        model="tgi",
        messages=[
            {"role": "system", "content": "You are a poet."},
            {"role": "user", "content": "Compose a rap about Muster."},
        ],
        metadata={"category": "rap"},
    )
    return completion.choices[0].message.content

rap = generate_rap()
print(rap)

Hugging Face Inference

Setup

Examples

Chat completion

Wrap the call with `@observe()`

See also

On this page

Hugging Face Inference

Setup

Examples

Chat completion

Wrap the call with @observe()

See also

On this page

Wrap the call with `@observe()`