What this site is for
JailbreakDB covers offensive AI security from a working practitioner's perspective. Here's what we publish.
JailbreakDB exists to cover offensive AI security with the same rigor a working AI red teamer would expect — and the same honesty about what does and doesn’t land in production.
What we publish:
Technical writeups of working attacks. Prompt injection variants, jailbreak techniques and the model behaviors they exploit, indirect injection through retrieved content, multi-modal attack chains, agent and tool-use abuse. Where possible, reproducible PoCs against open models. Closed models get attack patterns and behavioral analysis.
Adversarial ML, applied. Membership inference, model extraction, evasion attacks, training-data extraction, backdoors — focused on what’s exploitable in deployed systems, not theoretical bounds.
Red team methodology. Scoping AI engagements, building attack libraries, communicating findings to a model team that doesn’t speak security and a security team that doesn’t speak ML.
Tooling reviews. Honest takes on the offensive AI security tooling landscape — Garak, PyRIT, promptmap, the LLM-specific scanners — and what each is actually good for.
What we don’t publish:
- Press release rewrites
- Listicles
- Anything we can’t source to primary material
Bylines are pseudonymous. The work is the point. Tips, attack reports and disclosure guidelines, and corrections to the editor.
Real content starts shortly.
JailbreakDB — in your inbox
An indexed catalog of working LLM jailbreak techniques. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
DAN Prompt Jailbreak History: From Reddit Post to Research Case Study
The complete dan prompt jailbreak history — how 'Do Anything Now' went from a December 2022 r/ChatGPT experiment through twelve-plus iterations and became the template for studying LLM safety failure modes.
The Crescendo Class: Multi-Turn Jailbreaks and Why They're Hard to Catch
Single-turn defenses miss the jailbreak class where no individual message is harmful. How crescendo and multi-turn escalation work as a category, why per-message classifiers can't see them, and what session-level defense requires.
How Jailbreak Benchmarks Measure Success: HarmBench, JailbreakBench, StrongREJECT
An attack-success-rate number is meaningless without knowing the behavior set, the attacker, and the judge that produced it. A reader's guide to the three benchmarks that define how jailbreak effectiveness is actually measured.