Anomaly Detection
Spot cost spikes, error surges, and accuracy degradation against rolling baselines.
Anomaly detection runs hourly across every approved agent in your project, comparing recent metrics against a rolling 7-day baseline. When something deviates far enough, Muster files an anomaly event — and a separate alerter notifies the people on the project's contact list.
What it watches
| Type | What it detects | Default thresholds |
|---|---|---|
COST_SPIKE | Recent 3-day average cost vs. days 8-30 baseline | Medium ≥ 1.5×, High ≥ 2.5× |
ERROR_SURGE | Error rate above baseline + 2σ and > 5% absolute | 2 standard deviations |
ACCURACY_DEGRADATION | Pass rate on scoring rules drops over 7 days | Medium ≥ 15%, High ≥ 25% |
Thresholds are tunable per project — see Auto-Instrumentation for how Muster's own analyzer can suggest tighter or looser values based on your traffic.
How a detection becomes an alert
hourly anomalyDetection worker
└── compares recent vs. baseline metrics
└── on threshold breach: insert MusterAnomalyEvent
↓ (cron, every 30 min)
anomalyAlerter worker
└── reads new MusterAnomalyEvent rows
└── emails every active MusterAlertContact for the project
(Slack delivery is the same pattern, separately wired)The split is deliberate: detection is a fast read-only sweep; alerting is the slow, side-effecting hop. Alerter failures won't cause the detector to retry and double-write events.
Configuring contacts
In Settings → Alerts:
- Add one or more
MusterAlertContactrows withemailand a name. - Each row has an
activeflag — flip to disable without deleting. - Anomalies for the project will email every active contact whenever the alerter sweeps.
Contacts are project-scoped. To page the same team across multiple projects, register them in each.
Working with anomalies in the UI
The anomalies page lists every event with severity, type, agent name,
and a short description. The evidence JSON column carries the
underlying numbers — for a cost spike, you'll see baseline vs. observed
USD, the ratio, and the time window.
Each event has two terminal states:
- Acknowledge — you've seen it and either fixed it or accepted the
new normal. Stamps
acknowledgedByandacknowledgedAt. - Mute — silence repeats from the same agent + type for a period. Useful when you're already fixing something and don't need a stream of identical pages.
What's not yet documented
- Slack integration for alerter delivery (separate config block).
- Custom anomaly types beyond the three built-in ones.
- Per-agent threshold overrides via
MusterAlertContactfilters.
These will land alongside the upcoming Alert Routing guide.