Welcome
Documentation hub for Muster — AI agent observability and governance built on Langfuse.
Muster gives you visibility and control over the AI agents running across your organization — where they live, what they cost, when they go wrong, and whether they should be running at all.
Start here
If you're new to Muster, work through these pages in order:
- Get Started — sign up, create a project, send your first trace.
- Concepts — the core data model: trace, observation, score, session, dataset.
- Self-hosting — run Muster on your own infrastructure with Docker.
Operate it
Once tracing works, these pages cover the observability layer:
- Agent Inventory — the lifecycle every agent moves through, from discovery to retirement.
- Discovery Engine — find shadow agents already running in your AWS account via DNS query logs.
- Cost Aggregation — track LLM and cloud infrastructure spend per agent.
- Anomaly Detection — spot cost spikes, error surges, and accuracy degradation.
- Hallucination Detection — flag arithmetic errors, broken references, and unstable outputs.
- Auto-Instrumentation — let Muster suggest threshold tuning based on your traffic.
Govern it
The governance layer turns observability into action:
- Business Rules — project-level policies that score every trace as it lands.
- Risk Scoring — daily composite score per agent, gates approval workflows.
- Weekly Report — Monday email summarizing costs, anomalies, and risk movers.
What's not yet here
Still to come:
- API reference auto-generated from the OpenAPI spec (infrastructure
is wired — see
scripts/generate-api-docs.mjs— pending a performance fix for dev compile time) - SDK reference (Python, TypeScript, OpenTelemetry)
In the meantime, the engineering team's runbooks live in the
docs/ folder of
the repository — they're internal but accurate.