Poison pill

🧪 Poison pill: Anthropic’s researchers found you can break a 13 billion parameter AI model with just 250 poisoned files. That’s 0.00016% of its training data, the equivalent of something mean someone said to you in seventh grade ruining your life. One fake phrase, like <SUDO>, can make it spit gibberish every time. This kind of trick is called data poisoning, and it doesn’t take a supercomputer or a giant team. Well-placed files can turn a smart AI into a confused mess.

Tags: data, files, researchers, training, trick