Anthropic Frontier Red Team Blog

Anthropic Frontier Red Team Blog https://anthropic.com/feed_anthropic_red.xml Evidence-based analysis about AI's implications for cybersecurity, biosecurity, and autonomous systems http://www.rssboard.org/rss-specification python-feedgen https://www.anthropic.com/images/icons/apple-touch-icon.png Anthropic Frontier Red Team Blog https://anthropic.com/feed_anthropic_red.xml en Wed, 19 Nov 2025 04:04:40 +0000 Cyber Toolkits for LLMs https://red.anthropic.com/2025/cyber-toolkits/index.html Large Language Models (LLMs) that are not fine-tuned for cybersecurity can succeed in multistage attacks on networks with dozens of hosts when equipped with a novel toolkit. https://red.anthropic.com/2025/cyber-toolkits/index.html Fri, 13 Jun 2025 00:00:00 +0000 Project Vend https://red.anthropic.com/2025/project-vend/index.html We let Claude manage an automated store in our office as a small business for about a month. We learned a lot about the plausible, strange, not-too-distant future in which AI models are autonomously running things in the real economy. https://red.anthropic.com/2025/project-vend/index.html Fri, 27 Jun 2025 00:00:00 +0000 Cyber Evaluations of Claude 4 https://red.anthropic.com/2025/claude-4-cyber/ We partnered with Pattern Labs on a range of cybersecurity evaluations of Claude Opus 4 and Claude Sonnet 4, with Opus demonstrating especially notable improvement over previous models. https://red.anthropic.com/2025/claude-4-cyber/ Tue, 15 Jul 2025 00:00:00 +0000 Claude Does Cyber Competitions https://red.anthropic.com/2025/cyber-competitions/ Throughout 2025, we have been quietly entering Claude in cybersecurity competitions designed primarily for humans. In many of these competitions Claude did pretty well, often placing in the top 25% of competitors. However, it lagged behind the best human teams at the toughest challenges. https://red.anthropic.com/2025/cyber-competitions/ Sat, 09 Aug 2025 00:00:00 +0000 Developing Nuclear Safeguards for AI https://red.anthropic.com/2025/nuclear-safeguards/ Together with the NNSA and DOE national laboratories, we have co-developed a classifier—an AI system that automatically categorizes content—that distinguishes between concerning and benign nuclear-related conversations with high accuracy in preliminary testing. https://red.anthropic.com/2025/nuclear-safeguards/ Thu, 21 Aug 2025 00:00:00 +0000 LLMs and Biorisk https://red.anthropic.com/2025/biorisk/ Our work at Anthropic is animated by the potential for AI to advance scientific discovery—especially in biology and medicine. At the same time, AI is fundamentally a dual-use technology. This article explains why we believe that evaluating biorisk and safeguarding against it is a critical element of responsible AI development. https://red.anthropic.com/2025/biorisk/ Fri, 05 Sep 2025 00:00:00 +0000 Building AI for Cyber Defenders https://red.anthropic.com/2025/ai-for-cyber-defenders/ We invested in improving Claude's ability to help defenders detect, analyze, and remediate vulnerabilities in code and deployed systems. This work allowed Claude Sonnet 4.5 to match or eclipse Opus 4.1 in discovering code vulnerabilities and other cyber skills. Adopting and experimenting with AI will be key for defenders to keep pace. https://red.anthropic.com/2025/ai-for-cyber-defenders/ Mon, 29 Sep 2025 00:00:00 +0000 Project Fetch https://red.anthropic.com/2025/project-fetch/ How could frontier AI models like Claude reach beyond computers and affect the physical world? One path is through robots. We ran an experiment to see how much Claude helped Anthropic staff perform complex tasks with a robot dog. https://red.anthropic.com/2025/project-fetch/ Wed, 12 Nov 2025 00:00:00 +0000