Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #llm-security 6
- #jailbreak 3
- #red-team 2
- #adaptive-attacks 1
- #adversarial-attacks 1
- #arms-race 1
- #base64 1
- #classifier-evasion 1
- #content-filter 1
- #context-window 1
- #cvd 1
- #dan 1
- #detection-evasion 1
- #encoding-attacks 1
- #gcg 1
- #gradient-attacks 1
- #in-context-learning 1
- #jailbreak-history 1
- #many-shot 1
- #meta 1
- #obfuscation 1
- #persona 1
- #prompt-injection 1
- #research 1
- #research-ethics 1
- #responsible-disclosure 1
- #rlhf 1
- #roleplay-jailbreak 1
- #taxonomy 1
- #transferability 1
- #unicode 1
- #universal-suffix 1
- #vulnerability-disclosure 1
Categories
technique 4 posts
- Encoding and Obfuscation Jailbreaks: The Gap Between What Filters See and What Models ProcessContent filters typically operate on decoded, normalized text. LLMs process tokens, not text. The gap between these two layers is an attack surface that remains incompletely addressed.
- Roleplay and Persona Jailbreaks: Why They Work and Why They Don't Anymore (Mostly)DAN, AIM, STAN, and dozens of variants. Persona-based jailbreaks were the dominant technique from 2022-2023. Understanding why they worked — and why current defenses handle them better — is instructive for the next attack class.
- Universal Adversarial Suffixes: The GCG Attack and What's Transferred SinceGreedy Coordinate Gradient produces adversarial suffixes that transfer across models. Two years after the original paper, where does this technique stand against current defenses?
- Many-Shot Jailbreaking: Why Long Context Windows Created a New Attack SurfaceThe same architectural decision that makes LLMs better at long-context tasks — extended context windows — enabled a new class of jailbreak. The technique, how it works, and what defenses exist.
research 2 posts
- The Jailbreak Detection Evasion Arms Race: How Attackers Adapt to DefensesSafety classifiers get deployed; attackers find variants that evade them. This cycle is predictable. Understanding the mechanics of classifier evasion tells defenders what to invest in.
- LLM Jailbreak Taxonomy 2026: How the Techniques ClusterSix years of jailbreak research has produced a messy literature. This taxonomy organizes working techniques by the behavioral property they exploit — useful for both researchers and defenders.