Ratings by sethherr

5 Matching Ratings

Rated ↓ Article
Where things stand with the Department of War A statement from Dario Amodei anthropic.com 1,000 words Rated 2026-3-6 7:06am
A small number of samples can poison LLMs of any size Anthropic research on data-poisoning attacks in large language models anthropic.com 2,000 words Rated 2025-10-9 9:39pm
Detecting and countering misuse of AI: August 2025 Anthropic's threat intelligence report on AI cybercrime and other abuses anthropic.com 2,000 words Rated 2025-9-1 9:03pm
The Anthropic Economic Index Announcement of the new Anthropic Economic Index and description of the new data on AI use in occupations anthropic.com 2,000 words Rated 2025-2-10 12:59pm
Alignment faking in large language models A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models anthropic.com 2,000 words Rated 2024-12-19 7:01pm