| Rated ↓ Article |
|---|
A small number of samples can poison LLMs of any sizeAnthropic research on data-poisoning attacks in large language models anthropic.com 2,000 words Rated 2025-10-9 9:39pm - sethherr |
Detecting and countering misuse of AI: August 2025Anthropic's threat intelligence report on AI cybercrime and other abuses anthropic.com 2,000 words Rated 2025-9-1 9:03pm - sethherr |
Alignment faking in large language modelsA paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models anthropic.com 2,000 words Rated 2024-12-19 7:01pm - sethherr |