Auto-Instrumentation

Let Muster suggest threshold tuning and config improvements based on observed traces.

Auto-instrumentation (the "auto-tune" feature internally) analyzes your own traffic and proposes changes to detection thresholds, business rules, and other configuration that has sensible defaults but optimal values that depend on your fleet. It's admin-controlled and gated by explicit consent — nothing applies without you reviewing it.

What it can tune

Module	Examples
Risk Scoring	Severity weights, anomaly history penalties, accuracy thresholds
Business Rules	Enable/disable individual rules, adjust severity
Anomaly Detection	Cost spike, error surge, accuracy drop thresholds
Hallucination Detection	Arithmetic drift bounds, similarity threshold
Duplicate Detection	Trace similarity threshold
Discovery Engine	Confidence scores, volume bonuses

Workers read tuned values at runtime via TuningCache.get() with a hardcoded fallback, so a missing or invalid tuning never breaks behavior — it just means defaults are used.

How a run works

admin clicks "Analyze" in the Auto-Tune panel
   └── pick scope: Balanced / Cost Focus / Quality Focus
   └── pick window: default last 7 days
        ↓ enqueues MusterAutoTuneRun (status PENDING)
worker (autoTune/processor)
  └── metricsCollector aggregates fleet stats (NOT raw content)
        └── LangGraph orchestrator fans out 6 parallel sub-agents
              (one per module above)
                └── each agent writes 0+ MusterAutoTuneRecommendation rows
                      (status PENDING, with reasoning + impact estimate)
        ↓ run status COMPLETED, with token usage + cost recorded
admin reviews recommendations, applies or dismisses each
   └── apply → MusterProjectTuning updated → next worker run uses new values

The collector pulls only aggregated metrics (totals, averages, counts by category) — never raw trace content — so the LLM sub-agents get a fleet snapshot, not your customer data.

Reviewing recommendations

For each pending recommendation you'll see:

Module (e.g. Anomaly Detection)
Current value — what's in MusterProjectTuning today
Proposed value — what the agent suggests
Reasoning — short paragraph from the sub-agent
Impact estimate — what the change is likely to do (more/fewer flags, tighter/looser detection)

Three actions:

Apply — writes to MusterProjectTuning. Effective on the next worker tick (typically minutes).
Dismiss — keeps the row in DB for audit but won't apply.
Bulk apply — apply every PENDING recommendation. Use after a review pass when you trust the agent's batch.

Every apply action is audit-logged with user ID and timestamp.

Weekly auto-schedule

Optional cron at 02:00 UTC every Sunday will trigger a fresh analyze run for every project that opted in. The run lands recommendations in the same PENDING bucket — it does not auto-apply. Apply remains a human gate.

To enable: project owner toggles "Weekly auto-tune" in the Auto-Tune admin panel.

Admin requirements

Super-admin enables the feature globally and configures the LLM provider (OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock, Vertex AI) plus per-project rate limits.
Per-project, an owner must opt in. Without opt-in, the orchestrator returns immediately.
Token cost of each run is recorded on MusterAutoTuneRun so you can budget for it.

What's not yet auto-tunable

Sub-agents exist for these but no target worker is wired yet:

Alert thresholds on the anomalyAlerter (currently sends every anomaly the detector flags).
HITL (human-in-the-loop) routing policies.
Department budget caps.

Benchmark quantiles are intentionally not tunable — they're computed from your own data and shifting them would break comparability.