Ratings

3 Matching Ratings

Rated ↓ Article

A small number of samples can poison LLMs of any size

Anthropic research on data-poisoning attacks in large language models

anthropic.com 2,000 words

Rated 2025-10-9 9:39pm - sethherr

Detecting and countering misuse of AI: August 2025

Anthropic's threat intelligence report on AI cybercrime and other abuses

anthropic.com 2,000 words

Rated 2025-9-1 9:03pm - sethherr

Alignment faking in large language models

A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models

anthropic.com 2,000 words

Rated 2024-12-19 7:01pm - sethherr