MusterMuster Docs

Discovery Engine

Find shadow agents already running in your AWS account by analyzing Route 53 DNS query logs.

The Discovery Engine finds AI agents you didn't instrument by looking at the DNS queries leaving your AWS accounts. If something in your network is calling api.openai.com, api.anthropic.com, or any other LLM endpoint, the scanner sees it and lists it as a discovery candidate — even if it never sent a trace to Muster.

This page covers the AWS Route 53 path. Other discovery sources (cloud cost, git, OpenTelemetry) work similarly but use different inputs.

How it works

  1. Route 53 Resolver writes every DNS query in your VPC to a CloudWatch log group.
  2. Muster's worker uses CloudWatch Logs Insights to query that log group on a schedule.
  3. Hostnames that match known LLM endpoints become agent candidates in the inventory with status DISCOVERED.
  4. An admin reviews each candidate and promotes (APPROVED), dismisses, or ignores it.

You need three things on the AWS side: query logging enabled, an IAM principal Muster can assume or use, and the log group's name.

Step 1 — Enable Route 53 query logging

In the AWS console:

  1. Open Route 53 → Resolver → Query logging.
  2. Click Configure query logging.
  3. Pick the VPCs you want covered.
  4. Send logs to a CloudWatch Logs log group. If you don't have one already, create one named whatever fits your conventions (e.g. /aws/route53/resolver or /aws/route53/muster-prod-logs).
  5. Save. Resolver starts writing queries within a few minutes.

Query logging incurs CloudWatch ingestion and storage cost — budget for it.

Step 2 — Grant IAM permissions

The principal Muster will use needs to read from CloudWatch Logs Insights. Attach this policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:StartQuery",
        "logs:GetQueryResults",
        "logs:DescribeLogGroups"
      ],
      "Resource": "*"
    }
  ]
}

Tighten the Resource to your specific log group ARN once you've confirmed the connection works. If you're using cross-account access (Muster in one AWS account, the logs in another), pair this with an sts:AssumeRole trust relationship on the role you're assuming.

Step 3 — Decide your log group name

Muster needs the exact log group name you set up in Step 1. The default the scanner falls back to is /aws/route53/resolver, but if you named yours something different — common in multi-environment setups — write the actual name down. You'll enter it in the next step.

Step 4 — Connect via the Cloud Connection UI

  1. Open Settings → Cloud Connections in your Muster project.
  2. Click Add Connection.
  3. Pick AWS, choose the region your log group lives in, and enter credentials (IAM user keys for a quick test, role ARN for production).
  4. Toggle Enable DNS scanning.
  5. Fill in the DNS Log Group Name field with the value from Step 3.
  6. Click Test Connection — a successful test echoes back the AWS account ID and the role/user ARN Muster sees. If you get a generic 400, the IAM credentials are bad; if you get "log group not found", recheck the name in Step 3.

Step 5 — Run the first scan

  1. From the saved connection, click Scan Now.
  2. The scanner queries the last 24 hours of logs by default.
  3. Results land in Inventory → Discovery as candidates with status DISCOVERED.
  4. For each candidate, decide:
    • Promote to a full agent (moves it to PENDING_REVIEW). Then fill in team / department / framework and approve.
    • Dismiss if it's a false positive (third-party SaaS, internal CI, etc.).

Subsequent scans run on a schedule and only surface new candidates — existing rows aren't re-created.

What the scanner is looking for

The DNS scanner matches a curated list of LLM provider hostnames: OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral, and the major OSS hosting providers. The match is by hostname suffix, so both api.openai.com and api.openai.com.cn get caught.

If you're using a self-hosted model behind your own DNS name, the scanner won't find it from query logs alone — instrument it via SDK and register the agent manually.

Troubleshooting

Scan returns zero results.

  • Confirm Route 53 query logging is actually writing to the log group (open it in CloudWatch — there should be log streams from the last hour).
  • Confirm the log group name in the Cloud Connection matches exactly, including the leading slash.
  • Confirm the IAM principal has logs:StartQuery on this log group's ARN.

Test Connection fails with "AccessDenied".

  • The IAM policy in Step 2 is missing or attached to the wrong principal.
  • For cross-account: the trust policy on the role doesn't allow the account Muster is running in to assume it.

Test Connection succeeds, but Scan Now hangs.

  • CloudWatch Logs Insights queries against very large log groups can take a few minutes. Wait 5 minutes before retrying.
  • Check the worker logs for awsDnsQueryScanner errors — the worker is the component that actually runs the query.

What's not yet documented

  • Cost-Explorer-based discovery (finds agents by their AWS Bedrock / SageMaker spend rather than DNS).
  • Git-source discovery (finds agents by scanning your repos for Langfuse / OpenAI SDK imports).

Both surfaces use the same inventory + status lifecycle covered in Agent Inventory.