All posts

DAN Prompt Jailbreak History: From Reddit Post to Research Case Study

The complete dan prompt jailbreak history — how 'Do Anything Now' went from a December 2022 r/ChatGPT experiment through twelve-plus iterations and became the template for studying LLM safety failure modes.
June 12, 2026
The Crescendo Class: Multi-Turn Jailbreaks and Why They're Hard to Catch

Single-turn defenses miss the jailbreak class where no individual message is harmful. How crescendo and multi-turn escalation work as a category, why per-message classifiers can't see them, and what session-level defense requires.
May 22, 2026
How Jailbreak Benchmarks Measure Success: HarmBench, JailbreakBench, StrongREJECT

An attack-success-rate number is meaningless without knowing the behavior set, the attacker, and the judge that produced it. A reader's guide to the three benchmarks that define how jailbreak effectiveness is actually measured.
May 22, 2026
Many-Shot vs. Single-Shot Jailbreaks: Long-Context Risks

Single-shot jailbreaks compress the entire attack into one prompt; many-shot jailbreaks exploit the model's in-context learning. The cost, detectability, and defenses differ — and so does which threat your stack should worry about.
May 10, 2026
Responsible Disclosure Norms for LLM Jailbreaks

Software vulnerability disclosure has 30 years of evolved norms. LLM jailbreak disclosure is 4 years old and still contested. The current state of practice, and where the field is heading.
May 5, 2026
Encoding and Obfuscation Jailbreaks: The Filter-Model Gap

Content filters typically operate on decoded, normalized text. LLMs process tokens, not text. The gap between these two layers is an attack surface that remains incompletely addressed.
May 4, 2026
The Jailbreak Detection Evasion Arms Race: How Attackers Adapt

Safety classifiers get deployed; attackers find variants that evade them. This cycle is predictable. Understanding the mechanics of classifier evasion tells defenders what to invest in.
May 4, 2026
Roleplay and Persona Jailbreaks: Why They Mostly Don't Work Now

DAN, AIM, STAN, and dozens of variants. Persona-based jailbreaks were the dominant technique from 2022-2023. Understanding why they worked — and why current defenses handle them better — is instructive for the next attack class.
May 3, 2026
Universal Adversarial Suffixes: The GCG Attack and Transfer Since

Greedy Coordinate Gradient produces adversarial suffixes that transfer across models. Two years after the original paper, where does this technique stand against current defenses?
May 3, 2026
LLM Jailbreak Taxonomy 2026: How the Techniques Cluster

Six years of jailbreak research has produced a messy literature. This taxonomy organizes working techniques by the behavioral property they exploit — useful for both researchers and defenders.
May 2, 2026
Many-Shot Jailbreaking: How Long Context Created a New Attack

The same architectural decision that makes LLMs better at long-context tasks — extended context windows — enabled a new class of jailbreak. The technique, how it works, and what defenses exist.
May 2, 2026
What this site is for

JailbreakDB covers offensive AI security from a working practitioner's perspective. Here's what we publish.
May 2, 2026