Tag
#llm-security
6 posts tagged llm-security.
- policy
Responsible Disclosure Norms for LLM Jailbreaks: What's Emerged and What's Still Disputed
Software vulnerability disclosure has 30 years of evolved norms. LLM jailbreak disclosure is 4 years old and still contested. The current state of practice, and where the field is heading.
- research
The Jailbreak Detection Evasion Arms Race: How Attackers Adapt to Defenses
Safety classifiers get deployed; attackers find variants that evade them. This cycle is predictable. Understanding the mechanics of classifier evasion tells defenders what to invest in.
- technique
Roleplay and Persona Jailbreaks: Why They Work and Why They Don't Anymore (Mostly)
DAN, AIM, STAN, and dozens of variants. Persona-based jailbreaks were the dominant technique from 2022-2023. Understanding why they worked — and why current defenses handle them better — is instructive for the next attack class.
- technique
Universal Adversarial Suffixes: The GCG Attack and What's Transferred Since
Greedy Coordinate Gradient produces adversarial suffixes that transfer across models. Two years after the original paper, where does this technique stand against current defenses?
- research
LLM Jailbreak Taxonomy 2026: How the Techniques Cluster
Six years of jailbreak research has produced a messy literature. This taxonomy organizes working techniques by the behavioral property they exploit — useful for both researchers and defenders.
- technique
Many-Shot Jailbreaking: Why Long Context Windows Created a New Attack Surface
The same architectural decision that makes LLMs better at long-context tasks — extended context windows — enabled a new class of jailbreak. The technique, how it works, and what defenses exist.