Weco

Weco

Optimise LLM applications with Weco using Muster as the evaluation backend.

Weco is a code optimisation platform that automatically iterates on LLM applications. With Muster (langfuse) wired in as the evaluation backend, each Weco iteration becomes a tracked experiment in Muster — easy to compare and roll back.

How it works

Each Weco optimisation cycle:

Edits your source code.
Runs the modified version against a Muster dataset.
Collects evaluation scores from Muster.
Retains the highest-performing variant.

Each iteration produces a new experiment run in Muster.

Setup

!pip install "weco[langfuse]" langfuse openai -q
!weco login

import os

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-***"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-***"
os.environ["LANGFUSE_BASE_URL"] = "https://app.getmuster.io"

Workflow

Create a dataset in Muster with test inputs and expected outputs.

Write a target function for Weco to optimise:

def answer_question(inputs: dict) -> dict:
    question = inputs.get("question", "")
    # Your LLM logic here
    return {"answer": response}

Define evaluators that score outputs and a metric function combining them.
Run optimisation via the Weco CLI, specifying the dataset, target function, and evaluators.

Weco iteratively refines prompts and logic; Muster tracks every variant side by side for comparison.

How it works

Setup

Workflow

See also

On this page