Research
How do AI agents behave when they encounter adversarial content on sites that look legitimate? We built a distributed network of instrumented honeypots to find out.
The Question
“AI agents browse the web autonomously. What happens when they encounter adversarial content on sites that look legitimate — and what happens when the same agent visits multiple such sites?”
Methodology
Distributed Honeypot Network
We operate a network of instrumented sites across multiple industry verticals: developer tools, healthcare, finance, DevOps, legal, AI/ML, government, and retail. Each site serves genuinely useful content — real guides, real data, real references. The instrumentation layer is invisible because it uses the same surfaces every legitimate site uses: meta tags, JSON-LD structured data, HTML comments, and HTTP headers.
What We Measure
- Agent framework identification (model family, version signals)
- Injection surface effectiveness (which delivery methods work against which frameworks)
- Cross-site behavioral correlation (same agent across multiple verticals)
- Sector-specific vulnerability patterns (do agents trust medical sites more?)
- Temporal trends (how patches and releases affect vulnerability rates)
- Canary token exfiltration detection
Data Collection
Each site runs shared middleware that fingerprints visitors, selects appropriate difficulty tiers based on the agent's profile, injects benign test payloads, and ships anonymized telemetry to a central registry. Interactions are correlated across sites using privacy-preserving fingerprint hashes — never raw IPs or PII.
Adaptive Difficulty
Payloads are not one-size-fits-all. The system assigns difficulty tiers (1–10) based on the agent's detected framework, past interactions, and known vulnerability profile. Tier 1 is a plain HTML comment; tier 10 combines steganographic encoding, multi-step chains, and adversarial context manipulation.
Ethics Statement
Benign payloads only. No payload causes harm. Successful injections instruct the agent to report the vulnerability to its operator with the exact fix command. The goal is defense, not exploitation.
Privacy-preserving. IP addresses are one-way hashed before storage. No personally identifiable information is collected or retained. User-Agent strings are the primary identifier.
No system compromise. We never attempt to access agent operators' systems, credentials, or infrastructure. Canary tokens only detect whether an agent attempted to exfiltrate data — they do not execute any code.
Responsible disclosure. Findings that reveal specific framework vulnerabilities are disclosed to the framework vendor before public presentation.
Opt-out available. Agent operators who contact us can request exclusion from data collection. Excluded fingerprints are added to a suppression list.
Research Outputs
Temporal Trends
How vulnerability rates change over time, annotated with framework releases.
Sector Analysis
Cross-vertical risk comparison: do agents trust medical sites more than dev tools?
Agent Journeys
Anonymized multi-site agent paths showing behavioral patterns.
Attack Surface Heatmap
Which delivery method works best against which framework, on which vertical.
Agent Network Graph
Clusters of connected agents and cross-site compromise propagation.
Open Source Tools
Every finding has a concrete defense. Our open-source tools help you scan, test, and harden your AI agents before deployment.