MusterMuster Docs

Anomaly Detection

Spot cost spikes, error surges, and accuracy degradation against rolling baselines.

Anomaly detection runs hourly across every approved agent in your project, comparing recent metrics against a rolling 7-day baseline. When something deviates far enough, Muster files an anomaly event — and a separate alerter notifies the people on the project's contact list.

What it watches

TypeWhat it detectsDefault thresholds
COST_SPIKERecent 3-day average cost vs. days 8-30 baselineMedium ≥ 1.5×, High ≥ 2.5×
ERROR_SURGEError rate above baseline + 2σ and > 5% absolute2 standard deviations
ACCURACY_DEGRADATIONPass rate on scoring rules drops over 7 daysMedium ≥ 15%, High ≥ 25%

Thresholds are tunable per project — see Auto-Instrumentation for how Muster's own analyzer can suggest tighter or looser values based on your traffic.

How a detection becomes an alert

hourly anomalyDetection worker
  └── compares recent vs. baseline metrics
        └── on threshold breach: insert MusterAnomalyEvent
              ↓ (cron, every 30 min)
        anomalyAlerter worker
          └── reads new MusterAnomalyEvent rows
                └── emails every active MusterAlertContact for the project
                      (Slack delivery is the same pattern, separately wired)

The split is deliberate: detection is a fast read-only sweep; alerting is the slow, side-effecting hop. Alerter failures won't cause the detector to retry and double-write events.

Configuring contacts

In Settings → Alerts:

  1. Add one or more MusterAlertContact rows with email and a name.
  2. Each row has an active flag — flip to disable without deleting.
  3. Anomalies for the project will email every active contact whenever the alerter sweeps.

Contacts are project-scoped. To page the same team across multiple projects, register them in each.

Working with anomalies in the UI

The anomalies page lists every event with severity, type, agent name, and a short description. The evidence JSON column carries the underlying numbers — for a cost spike, you'll see baseline vs. observed USD, the ratio, and the time window.

Each event has two terminal states:

  • Acknowledge — you've seen it and either fixed it or accepted the new normal. Stamps acknowledgedBy and acknowledgedAt.
  • Mute — silence repeats from the same agent + type for a period. Useful when you're already fixing something and don't need a stream of identical pages.

What's not yet documented

  • Slack integration for alerter delivery (separate config block).
  • Custom anomaly types beyond the three built-in ones.
  • Per-agent threshold overrides via MusterAlertContact filters.

These will land alongside the upcoming Alert Routing guide.