Auto-Instrumentation
Let Muster suggest threshold tuning and config improvements based on observed traces.
Auto-instrumentation (the "auto-tune" feature internally) analyzes your own traffic and proposes changes to detection thresholds, business rules, and other configuration that has sensible defaults but optimal values that depend on your fleet. It's admin-controlled and gated by explicit consent — nothing applies without you reviewing it.
What it can tune
| Module | Examples |
|---|---|
| Risk Scoring | Severity weights, anomaly history penalties, accuracy thresholds |
| Business Rules | Enable/disable individual rules, adjust severity |
| Anomaly Detection | Cost spike, error surge, accuracy drop thresholds |
| Hallucination Detection | Arithmetic drift bounds, similarity threshold |
| Duplicate Detection | Trace similarity threshold |
| Discovery Engine | Confidence scores, volume bonuses |
Workers read tuned values at runtime via TuningCache.get() with a
hardcoded fallback, so a missing or invalid tuning never breaks
behavior — it just means defaults are used.
How a run works
admin clicks "Analyze" in the Auto-Tune panel
└── pick scope: Balanced / Cost Focus / Quality Focus
└── pick window: default last 7 days
↓ enqueues MusterAutoTuneRun (status PENDING)
worker (autoTune/processor)
└── metricsCollector aggregates fleet stats (NOT raw content)
└── LangGraph orchestrator fans out 6 parallel sub-agents
(one per module above)
└── each agent writes 0+ MusterAutoTuneRecommendation rows
(status PENDING, with reasoning + impact estimate)
↓ run status COMPLETED, with token usage + cost recorded
admin reviews recommendations, applies or dismisses each
└── apply → MusterProjectTuning updated → next worker run uses new valuesThe collector pulls only aggregated metrics (totals, averages, counts by category) — never raw trace content — so the LLM sub-agents get a fleet snapshot, not your customer data.
Reviewing recommendations
For each pending recommendation you'll see:
- Module (e.g. Anomaly Detection)
- Current value — what's in
MusterProjectTuningtoday - Proposed value — what the agent suggests
- Reasoning — short paragraph from the sub-agent
- Impact estimate — what the change is likely to do (more/fewer flags, tighter/looser detection)
Three actions:
- Apply — writes to
MusterProjectTuning. Effective on the next worker tick (typically minutes). - Dismiss — keeps the row in DB for audit but won't apply.
- Bulk apply — apply every PENDING recommendation. Use after a review pass when you trust the agent's batch.
Every apply action is audit-logged with user ID and timestamp.
Weekly auto-schedule
Optional cron at 02:00 UTC every Sunday will trigger a fresh analyze run for every project that opted in. The run lands recommendations in the same PENDING bucket — it does not auto-apply. Apply remains a human gate.
To enable: project owner toggles "Weekly auto-tune" in the Auto-Tune admin panel.
Admin requirements
- Super-admin enables the feature globally and configures the LLM provider (OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock, Vertex AI) plus per-project rate limits.
- Per-project, an owner must opt in. Without opt-in, the orchestrator returns immediately.
- Token cost of each run is recorded on
MusterAutoTuneRunso you can budget for it.
What's not yet auto-tunable
Sub-agents exist for these but no target worker is wired yet:
- Alert thresholds on the
anomalyAlerter(currently sends every anomaly the detector flags). - HITL (human-in-the-loop) routing policies.
- Department budget caps.
Benchmark quantiles are intentionally not tunable — they're computed from your own data and shifting them would break comparability.