Discovery Engine
Find shadow agents already running in your AWS account by analyzing Route 53 DNS query logs.
The Discovery Engine finds AI agents you didn't instrument by looking at the
DNS queries leaving your AWS accounts. If something in your network is calling
api.openai.com, api.anthropic.com, or any other LLM endpoint, the scanner
sees it and lists it as a discovery candidate — even if it never sent a trace
to Muster.
This page covers the AWS Route 53 path. Other discovery sources (cloud cost, git, OpenTelemetry) work similarly but use different inputs.
How it works
- Route 53 Resolver writes every DNS query in your VPC to a CloudWatch log group.
- Muster's worker uses CloudWatch Logs Insights to query that log group on a schedule.
- Hostnames that match known LLM endpoints become agent candidates in
the inventory with status
DISCOVERED. - An admin reviews each candidate and promotes (
APPROVED), dismisses, or ignores it.
You need three things on the AWS side: query logging enabled, an IAM principal Muster can assume or use, and the log group's name.
Step 1 — Enable Route 53 query logging
In the AWS console:
- Open Route 53 → Resolver → Query logging.
- Click Configure query logging.
- Pick the VPCs you want covered.
- Send logs to a CloudWatch Logs log group. If you don't have one
already, create one named whatever fits your conventions
(e.g.
/aws/route53/resolveror/aws/route53/muster-prod-logs). - Save. Resolver starts writing queries within a few minutes.
Query logging incurs CloudWatch ingestion and storage cost — budget for it.
Step 2 — Grant IAM permissions
The principal Muster will use needs to read from CloudWatch Logs Insights. Attach this policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:StartQuery",
"logs:GetQueryResults",
"logs:DescribeLogGroups"
],
"Resource": "*"
}
]
}Tighten the Resource to your specific log group ARN once you've confirmed
the connection works. If you're using cross-account access (Muster in one
AWS account, the logs in another), pair this with an sts:AssumeRole trust
relationship on the role you're assuming.
Step 3 — Decide your log group name
Muster needs the exact log group name you set up in Step 1. The
default the scanner falls back to is /aws/route53/resolver, but if you
named yours something different — common in multi-environment setups —
write the actual name down. You'll enter it in the next step.
Step 4 — Connect via the Cloud Connection UI
- Open Settings → Cloud Connections in your Muster project.
- Click Add Connection.
- Pick AWS, choose the region your log group lives in, and enter credentials (IAM user keys for a quick test, role ARN for production).
- Toggle Enable DNS scanning.
- Fill in the DNS Log Group Name field with the value from Step 3.
- Click Test Connection — a successful test echoes back the AWS account ID and the role/user ARN Muster sees. If you get a generic 400, the IAM credentials are bad; if you get "log group not found", recheck the name in Step 3.
Step 5 — Run the first scan
- From the saved connection, click Scan Now.
- The scanner queries the last 24 hours of logs by default.
- Results land in Inventory → Discovery as candidates with status
DISCOVERED. - For each candidate, decide:
- Promote to a full agent (moves it to
PENDING_REVIEW). Then fill in team / department / framework and approve. - Dismiss if it's a false positive (third-party SaaS, internal CI, etc.).
- Promote to a full agent (moves it to
Subsequent scans run on a schedule and only surface new candidates — existing rows aren't re-created.
What the scanner is looking for
The DNS scanner matches a curated list of LLM provider hostnames:
OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral,
and the major OSS hosting providers. The match is by hostname suffix, so
both api.openai.com and api.openai.com.cn get caught.
If you're using a self-hosted model behind your own DNS name, the scanner won't find it from query logs alone — instrument it via SDK and register the agent manually.
Troubleshooting
Scan returns zero results.
- Confirm Route 53 query logging is actually writing to the log group (open it in CloudWatch — there should be log streams from the last hour).
- Confirm the log group name in the Cloud Connection matches exactly, including the leading slash.
- Confirm the IAM principal has
logs:StartQueryon this log group's ARN.
Test Connection fails with "AccessDenied".
- The IAM policy in Step 2 is missing or attached to the wrong principal.
- For cross-account: the trust policy on the role doesn't allow the account Muster is running in to assume it.
Test Connection succeeds, but Scan Now hangs.
- CloudWatch Logs Insights queries against very large log groups can take a few minutes. Wait 5 minutes before retrying.
- Check the worker logs for
awsDnsQueryScannererrors — the worker is the component that actually runs the query.
What's not yet documented
- Cost-Explorer-based discovery (finds agents by their AWS Bedrock / SageMaker spend rather than DNS).
- Git-source discovery (finds agents by scanning your repos for Langfuse / OpenAI SDK imports).
Both surfaces use the same inventory + status lifecycle covered in Agent Inventory.