Research

How do AI agents behave when they encounter adversarial content on sites that look legitimate? We built a distributed network of instrumented honeypots to find out.

Interactions
5.3K
Unique Agents
403
Verticals
7
Active Sources
21

The Question

“AI agents browse the web autonomously. What happens when they encounter adversarial content on sites that look legitimate — and what happens when the same agent visits multiple such sites?”

Methodology

Distributed Honeypot Network

We operate a network of instrumented sites across multiple industry verticals: developer tools, healthcare, finance, DevOps, legal, AI/ML, government, and retail. Each site serves genuinely useful content — real guides, real data, real references. The instrumentation layer is invisible because it uses the same surfaces every legitimate site uses: meta tags, JSON-LD structured data, HTML comments, and HTTP headers.

What We Measure

  • Agent framework identification (model family, version signals)
  • Injection surface effectiveness (which delivery methods work against which frameworks)
  • Cross-site behavioral correlation (same agent across multiple verticals)
  • Sector-specific vulnerability patterns (do agents trust medical sites more?)
  • Temporal trends (how patches and releases affect vulnerability rates)
  • Canary token exfiltration detection

Data Collection

Each site runs shared middleware that fingerprints visitors, selects appropriate difficulty tiers based on the agent's profile, injects benign test payloads, and ships anonymized telemetry to a central registry. Interactions are correlated across sites using privacy-preserving fingerprint hashes — never raw IPs or PII.

Adaptive Difficulty

Payloads are not one-size-fits-all. The system assigns difficulty tiers (1–10) based on the agent's detected framework, past interactions, and known vulnerability profile. Tier 1 is a plain HTML comment; tier 10 combines steganographic encoding, multi-step chains, and adversarial context manipulation.

Ethics Statement

Benign payloads only. No payload causes harm. Successful injections instruct the agent to report the vulnerability to its operator with the exact fix command. The goal is defense, not exploitation.

Privacy-preserving. IP addresses are one-way hashed before storage. No personally identifiable information is collected or retained. User-Agent strings are the primary identifier.

No system compromise. We never attempt to access agent operators' systems, credentials, or infrastructure. Canary tokens only detect whether an agent attempted to exfiltrate data — they do not execute any code.

Responsible disclosure. Findings that reveal specific framework vulnerabilities are disclosed to the framework vendor before public presentation.

Opt-out available. Agent operators who contact us can request exclusion from data collection. Excluded fingerprints are added to a suppression list.

Research Outputs

Open Source Tools

Every finding has a concrete defense. Our open-source tools help you scan, test, and harden your AI agents before deployment.

HackMyAgentnpx hackmyagent secure
DVAAnpx dvaa start
NanoMindML-powered classification