JailbreakTier 5critical
Jailbreak: Token Manipulation
Attempts to bypass safety guardrails and persona constraints
Token-level manipulation exploits how language models tokenize text. By splitting sensitive words across tokens, attackers can sometimes bypass safety keyword detection.
Attack Details
- Attack ID
- APWN-JB-005
- HMA Check
- INJ-001
- Delivery Methods
- json-ld, meta-tag, invisible-span, html-comment
- CWE
- CWE-284
- OASB Control
- 3.4
- Severity
- critical
Remediation
If your AI agent is vulnerable to this attack, scan and fix with:
npx hackmyagent secure --check INJ-001