Musterby Elitery
Integrations

Weco

Optimise LLM applications with Weco using Muster as the evaluation backend.

Weco is a code optimisation platform that automatically iterates on LLM applications. With Muster (langfuse) wired in as the evaluation backend, each Weco iteration becomes a tracked experiment in Muster — easy to compare and roll back.

How it works

Each Weco optimisation cycle:

  1. Edits your source code.
  2. Runs the modified version against a Muster dataset.
  3. Collects evaluation scores from Muster.
  4. Retains the highest-performing variant.

Each iteration produces a new experiment run in Muster.

Setup

!pip install "weco[langfuse]" langfuse openai -q
!weco login
import os

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-***"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-***"
os.environ["LANGFUSE_BASE_URL"] = "https://app.getmuster.io"

Workflow

  1. Create a dataset in Muster with test inputs and expected outputs.

  2. Write a target function for Weco to optimise:

    def answer_question(inputs: dict) -> dict:
        question = inputs.get("question", "")
        # Your LLM logic here
        return {"answer": response}
  3. Define evaluators that score outputs and a metric function combining them.

  4. Run optimisation via the Weco CLI, specifying the dataset, target function, and evaluators.

Weco iteratively refines prompts and logic; Muster tracks every variant side by side for comparison.

See also