Anthropic News https://anthropic.com/news/feed_anthropic_news.xml Latest updates from Anthropic's newsroom http://www.rssboard.org/rss-specification python-feedgen https://www.anthropic.com/images/icons/apple-touch-icon.png Anthropic News https://anthropic.com/news/feed_anthropic_news.xml en Thu, 12 Feb 2026 12:13:14 +0000 Company https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy Today we are publishing a significant update to our Responsible Scaling Policy (RSP), the risk governance framework we use to mitigate potential catastrophic risks from frontier AI systems. This update introduces a more flexible and nuanced approach to assessing and managing AI risks while maintaining our commitment not to train or deploy models unless we have implemented adequate safeguards. Key improvements include new capability thresholds to indicate when we will upgrade our safeguards, refined processes for evaluating model capabilities and the adequacy of our safeguards (inspired by safety case methodologies ), and new measures for internal governance and external input. By learning from our implementation experiences and drawing on risk management practices used in other high-consequence industries, we aim to better prepare for the rapid pace of AI advancement. <article>Announcements<h1>Announcing our updated Responsible Scaling Policy</h1>Oct 15, 2024<a href="http://anthropic.com/rsp ">Read the Responsible Scaling Policy </a><p><strong>Today we are publishing a significant update to our Responsible Scaling Policy (RSP), the risk governance framework we use to mitigate potential catastrophic risks from frontier AI systems. </strong>This update introduces a more flexible and nuanced approach to assessing and managing AI risks while maintaining our commitment not to train or deploy models unless we have implemented adequate safeguards. Key improvements include new capability thresholds to indicate when we will upgrade our safeguards, refined processes for evaluating model capabilities and the adequacy of our safeguards (inspired by <a href="https://arxiv.org/abs/2403.10462">safety case methodologies</a>), and new measures for internal governance and external input. By learning from our implementation experiences and drawing on risk management practices used in other high-consequence industries, we aim to better prepare for the rapid pace of AI advancement.</p><h2>The promise and challenge of advanced AI</h2><p>As frontier AI models advance, they have the potential to bring about transformative benefits for our society and economy. AI could accelerate scientific discoveries, revolutionize healthcare, enhance our education system, and create entirely new domains for human creativity and innovation. However, frontier AI systems also present new challenges and risks that warrant careful study and effective safeguards.</p><p>In September 2023, we <a href="https://www.anthropic.com/news/anthropics-responsible-scaling-policy">released</a> our Responsible Scaling Policy, a framework for managing risks from increasingly capable AI systems. After a year of implementation and learning, we are now sharing a significantly updated version that reflects practical insights and accounts for advancing technological capabilities.</p><p>Although this policy focuses on catastrophic risks like the categories listed below, they are not the only risks that we monitor and prepare for. Our <a href="https://www.anthropic.com/legal/aup">Usage Policy</a> sets forth our standards for the use of our products, including rules that prohibit using our models to spread misinformation, incite violence or hateful behavior, or engage in fraudulent or abusive practices. We continually refine our technical measures for enforcing our trust and safety standards at scale. Further, we conduct research to understand the broader <a href="https://www.anthropic.com/research#societal-impacts">societal impacts</a> of our models. Our Responsible Scaling Policy complements our work in these areas, contributing to our understanding of current and potential risks.</p><h2>A framework for proportional safeguards</h2><p>As before, we maintain our core commitment: we will not train or deploy models unless we have implemented safety and security measures that keep risks below acceptable levels. Our RSP is based on the principle of proportional protection: safeguards that scale with potential risks. To do this, we use <strong>AI Safety Level Standards (ASL Standards)</strong>, graduated sets of safety and security measures that become more stringent as model capabilities increase. Inspired by <a href="https://en.wikipedia.org/wiki/Biosafety_level">Biosafety Levels,</a> these begin at ASL-1 for models that have very basic capabilities (for example, chess-playing bots) and progress through ASL-2, ASL-3, and so on.</p><p>In our updated policy, we have refined our methodology for assessing specific capabilities (and their associated risks) and implementing proportional safety and security measures. Our updated framework has two key components:</p><ul><li><strong>Capability Thresholds:</strong> Specific AI abilities that, if reached, would require stronger safeguards than our current baseline.</li><li><strong>Required Safeguards: </strong>The specific ASL Standards needed to mitigate risks once a Capability Threshold has been reached.</li></ul><p>At present, all of our models operate under ASL-2 Standards, which reflect current industry best practices. Our updated policy defines two key Capability Thresholds that would require upgraded safeguards:</p><ul><li><strong>Autonomous AI Research and Development:</strong> If a model can independently conduct complex AI research tasks typically requiring human expertise—potentially significantly accelerating AI development in an unpredictable way—we require elevated security standards (potentially ASL-4 or higher standards) and additional safety assurances to avoid a situation where development outpaces our ability to address emerging risks.</li><li><strong>Chemical, Biological, Radiological, and Nuclear (CBRN) weapons:</strong> If a model can meaningfully assist someone with a basic technical background in creating or deploying CBRN weapons, we require enhanced security and deployment safeguards (ASL-3 standards).</li></ul><p>ASL-3 safeguards involve enhanced security measures and deployment controls. On the security side, this will include internal access controls and more robust protection of model weights. For deployment risks, we plan to implement a multi-layered approach to prevent misuse, including real-time and asynchronous monitoring, rapid response protocols, and thorough pre-deployment red teaming.</p><h2>Implementation and oversight</h2><p>To contribute to effective implementation of the policy, we have established:</p><ul><li><strong>Capability assessments</strong>: Routine model evaluations based on our Capability Thresholds to determine whether our current safeguards are still appropriate. (Summaries of past assessments are available <a href="https://www.anthropic.com/rsp-updates">here</a>.)</li><li><strong>Safeguard assessments: </strong>Routine evaluation of the effectiveness of our security and deployment safety measures to assess whether we have met the Required Safeguards bar. (Summaries of these decisions will be available <a href="https://www.anthropic.com/rsp-updates">here</a>.)</li><li><strong>Documentation and decision-making: </strong>Processes for documenting the capability and safeguard assessments, inspired by procedures (such as <a href="https://arxiv.org/abs/2403.10462">safety case methodologies</a>) common in high-reliability industries.</li><li><strong>Measures for internal governance and external input: </strong>Our assessment methodology will be backed up by internal stress-testing in addition to our existing internal reporting process for safety issues. We are also soliciting external expert feedback on our methodologies.1</li></ul><h2>Learning from experience</h2><p>We have learned a lot in our first year with the previous RSP in effect, and are using this update as an opportunity to reflect on what has worked well and what makes sense to update in the policy. As part of this, we conducted our first review of how well we adhered to the framework and identified a small number of instances where we fell short of meeting the full letter of its requirements. These included procedural issues such as completing a set of evaluations three days later than scheduled or a lack of clarity on how and where we should note any changes to our placeholder evaluations. We also flagged some evaluations where we may have been able to elicit slightly better model performance through implementing standard techniques (such as chain-of-thought or best-of-N).</p><p>In all cases, we found these instances posed minimal risk to the safety of our models. We used the additional three days to refine and improve our evaluations; the different set of evaluations we used provided a more accurate assessment than the placeholder evaluations; and our evaluation methodology still showed we were sufficiently far from the thresholds. From this, we learned two valuable lessons to incorporate into our updated framework: we needed to incorporate more flexibility into our policies, and we needed to improve our process for tracking compliance with the RSP. You can read more <a href="http://anthropic.com/rsp-updates">here</a>.</p><p>Since we first released the RSP a year ago, our goal has been to offer an example of a framework that others might draw inspiration from when crafting their own AI risk governance policies. We hope that proactively sharing our experiences implementing our own policy will help other companies in implementing their own risk management frameworks and contribute to the establishment of best practices across the AI ecosystem.</p><h2>Looking ahead</h2><p>The frontier of AI is advancing rapidly, making it challenging to anticipate what safety measures will be appropriate for future systems. All aspects of our safety program will continue to evolve: our policies, evaluation methodology, safeguards, and our research into potential risks and mitigations.</p><p>Additionally, Co-Founder and Chief Science Officer Jared Kaplan will serve as Anthropic’s Responsible Scaling Officer, succeeding Co-Founder and Chief Technology Officer Sam McCandlish who held this role over the last year. Sam oversaw the RSP’s initial implementation and will continue to focus on his duties as Chief Technology Officer. As we work to scale up our efforts on implementing the RSP, we’re also opening a position for a Head of Responsible Scaling. This role will be responsible for coordinating the many teams needed to iterate on and successfully comply with the RSP.</p><p>If you would like to contribute to AI risk management at Anthropic, <a href="https://www.anthropic.com/jobs">we are hiring</a>! Many of our teams now contribute to risk management via the RSP, including:</p><ul><li>Frontier Red Team (responsible for threat modeling and capability assessments)</li><li>Trust &amp; Safety (responsible for developing deployment safeguards)</li><li>Security and Compliance (responsible for security safeguards and risk management)</li><li>Alignment Science (including sub-teams responsible for developing ASL-3+ safety measures, for misalignment-focused capability evaluations, and for our internal alignment stress-testing program)</li><li>RSP Team (responsible for policy drafting, assurance, and cross-company execution)</li></ul><p><strong>Read the updated policy at <a href="http://anthropic.com/rsp">anthropic.com/rsp</a>, and supplementary information at <a href="http://anthropic.com/rsp-updates">anthropic.com/rsp-updates</a>.</strong></p><p><em>We extend our sincere gratitude to the many external groups that provided invaluable feedback on the development and refinement of our Responsible Scaling Policy.</em></p><p><br/></p><p></p><p></p><h4>Footnotes</h4><p>1 <em>We have also shared our assessment methodology with both AI Safety Institutes, as well as a selection of independent experts and organizations, for feedback. This does not represent an endorsement from either AI Safety Institute or the independent experts and organizations. </em></p><p><br/></p></article> https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy News Thoughts on America’s AI Action Plan https://www.anthropic.com/news/thoughts-on-america-s-ai-action-plan Today, the White House released "Winning the Race: America's AI Action Plan"—a comprehensive strategy to maintain America's advantage in AI development. We are encouraged by the plan’s focus on accelerating AI infrastructure and federal adoption, as well as strengthening safety testing and security coordination. Many of the plan’s recommendations reflect Anthropic’s response to the Office of Science and Technology Policy’s (OSTP) prior request for information . While the plan positions America for AI advancement, we believe strict export controls and AI development transparency standards remain crucial next steps for securing American AI leadership. <article>Policy<h1>Thoughts on America’s AI Action Plan</h1>Jul 23, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6e00dbffcddc82df5e471c43453abfc74ca94e8d-1000x1000.svg"/><p>Today, the White House released "Winning the Race: America's AI Action Plan"—a comprehensive strategy to maintain America's advantage in AI development. We are encouraged by the plan’s focus on accelerating AI infrastructure and federal adoption, as well as strengthening safety testing and security coordination. Many of the plan’s recommendations reflect Anthropic’s <a href="https://assets.anthropic.com/m/4e20a4ab6512e217/original/Anthropic-Response-to-OSTP-RFI-March-2025-Final-Submission-v3.pdf">response</a> to the Office of Science and Technology Policy’s (OSTP) prior <a href="https://www.federalregister.gov/documents/2025/02/06/2025-02305/request-for-information-on-the-development-of-an-artificial-intelligence-ai-action-plan">request for information</a>. While the plan positions America for AI advancement, we believe strict export controls and AI development transparency standards remain crucial next steps for securing American AI leadership.</p><h2><strong>Accelerating AI infrastructure and adoption</strong></h2><p></p><p>The Action Plan prioritizes AI infrastructure and adoption, consistent with Anthropic’s submission to OSTP in March.</p><p>We applaud the Administration's commitment to streamlining data center and energy permitting to address AI’s power needs. As we stated in our OSTP submission and at <a href="https://www.youtube.com/watch?v=KO8vMSsjmGY">the Pennsylvania Energy and Innovation Summit</a>, without adequate domestic energy capacity, American AI developers may be forced to relocate operations overseas, potentially exposing sensitive technology to foreign adversaries. Our recently published <a href="https://www-cdn.anthropic.com/0dc382a2086f6a054eeb17e8a531bd9625b8e6e5.pdf">“Build AI in America” report</a> details the steps the Administration can take to accelerate the buildout of our nation’s AI infrastructure, and we look forward to working with the Administration on measures to expand domestic energy capacity.</p><p>The Plan’s recommendations to increase the federal government's adoption of AI also includes proposals that are closely aligned with Anthropic’s policy priorities and recommendations to the White House. These include:</p><ul><li>Tasking the Office of Management and Budget (OMB) to address resource constraints, procurement limitations, and programmatic obstacles to federal AI adoption.</li><li>Launching a Request for Information (RFI) to identify federal regulations that impede AI innovation, with OMB coordinating reform efforts.</li><li>Updating federal procurement standards to remove barriers that prevent agencies from deploying AI systems.</li><li>Promoting AI adoption across defense and national security applications through public-private collaboration.</li></ul><h2><strong>Democratizing AI’s benefits</strong></h2><p></p><p>We are aligned with the Action Plan’s focus on ensuring broad participation in and benefit from AI’s continued development and deployment.</p><p>The Action Plan’s continuation of the National AI Research Resource (NAIRR) pilot ensures that students and researchers across the country can participate in and contribute to the advancement of the AI frontier. We have <a href="https://nsf-gov-resources.nsf.gov/files/Anthropic-NAIRR-RFI-Response-2022.pdf?VersionId=zOuWKBYCI5lsNESyOvsZIJQM9ePTOTrK">long supported</a> the NAIRR and are proud of <a href="https://www.nsf.gov/focus-areas/artificial-intelligence/nairr#nairr-pilot-partners-and-contributors-890">our partnership</a> with the pilot program. Further, the Action Plan’s emphasis on rapid retraining programs for displaced workers and pre-apprenticeship AI programs recognizes the errors of prior technological transitions and demonstrates a commitment to delivering AI’s benefits to all Americans.</p><p>Complementing these proposals are our efforts to understand how AI is transforming, and how it will transform, our economy. The <a href="https://www.anthropic.com/economic-index">Economic Index</a> and the <a href="https://www.anthropic.com/economic-futures">Economic Futures Program</a> aim to provide researchers and policymakers with the data and tools they need to ensure AI’s economic benefits are broadly shared and risks are appropriately managed.</p><h2><strong>Promoting secure AI development</strong></h2><p></p><p><a href="https://www.darioamodei.com/essay/machines-of-loving-grace">Powerful AI systems</a> are going to be developed in the coming years. The plan’s emphasis on defending against the misuse of powerful AI models and preparing for future AI related risks is appropriate and excellent. In particular, we commend the administration’s prioritization of supporting research into <a href="https://www.darioamodei.com/post/the-urgency-of-interpretability">AI interpretability</a>, AI control systems, and adversarial robustness. These are important lines of research that must be supported to help us deal with powerful AI systems.</p><p>We're glad the Action Plan affirms the National Institute of Standards and Technology's Center for AI Standards and Innovation’s (CAISI) important work to evaluate frontier models for national security issues and we look forward to continuing our close partnership with them. We encourage the Administration to continue to invest in CAISI. As we noted in our submission, advanced AI systems are demonstrating concerning improvements in capabilities relevant to biological weapons development. CAISI has played a leading role in developing testing and evaluation capabilities to address these risks. We encourage focusing these efforts on the most unique and acute national security<em> </em>risks that AI systems may pose.</p><h2><strong>The need for a national standard</strong></h2><p></p><p>Beyond testing, we believe basic AI development transparency requirements, such as public reporting on safety testing and capability assessments, are essential for responsible AI development. Leading AI model developers should be held to basic and publicly-verifiable standards of assessing and managing the catastrophic risks posed by their systems. Our <a href="https://www-cdn.anthropic.com/19cc4bf9eb6a94f9762ac67368f3322cf82b09fe.pdf">proposed framework</a> for frontier model transparency focuses on these risks. We would have liked to see the report do more on this topic.</p><p>Leading labs, including Anthropic, OpenAI, and Google DeepMind, have already implemented voluntary safety frameworks, which demonstrates that responsible development and innovation can coexist. In fact, with the launch of Claude Opus 4, we proactively <a href="https://www.anthropic.com/news/activating-asl3-protections">activated ASL-3 protections</a> to prevent misuse for chemical, biological, radiological, and nuclear (CBRN) weapons development. This precautionary step shows that far from slowing innovation, robust safety protections help us build better, more reliable systems.</p><p>We share the Administration’s concern about overly-prescriptive regulatory approaches creating an <a href="https://www.nytimes.com/2025/06/05/opinion/anthropic-ceo-regulate-transparency.html">inconsistent and burdensome patchwork of laws</a>. Ideally, these transparency requirements would come from the government by way of a single national standard. However, in line with our <a href="https://www.nytimes.com/2025/06/05/opinion/anthropic-ceo-regulate-transparency.html">stated belief</a> that a ten-year moratorium on state AI laws is too blunt an instrument, we continue to oppose proposals aimed at preventing states from enacting measures to protect their citizens from potential harms caused by powerful AI systems, if the federal government fails to act.</p><h2><strong>Maintaining strong export controls</strong></h2><p></p><p>The Action Plan states that “denying our foreign adversaries access to [Advanced AI compute] . . . is a matter of both geostrategic competition and national security.” We strongly agree. That is why we are concerned with the Administration’s recent reversal on export of the Nvidia H20 chips to China. </p><p>AI development has been defined by scaling laws: the intelligence and capability of a system is defined by the scale of its compute, energy, and data inputs during training. While these scaling laws continue to hold, the newest and most capable reasoning models have demonstrated that AI capability scales with the amount of compute made available to a system working on a given task, or “inference.” The amount of compute made available during inference is limited by a chip’s memory bandwidth. While the H20’s raw computing power is exceeded by chips made by Huawei, as Commerce Secretary Lutnick and Under Secretary Kessler <a href="https://www.bloomberg.com/news/articles/2025-06-12/us-says-huawei-s-2025-output-is-no-more-than-200-000-ai-chips">recently testified</a>, Huawei continues to struggle with production volume and no domestically-produced Chinese chip <a href="https://www.chinatalk.media/p/mapping-chinas-hbm-advancement">matches the H20’s memory bandwidth</a>. </p><p>As a result, the H20 provides unique and critical computing capabilities that would otherwise be unavailable to Chinese firms, and will compensate for China’s otherwise major shortage of AI chips. To allow export of the H20 to China would squander an opportunity to extend American AI dominance just as a new phase of competition is starting. Moreover, exports of U.S. AI chips will not divert the Chinese Communist Party from its <a href="https://rhg.com/research/back-to-the-future-from-freeze-in-place-to-sliding-scale-chip-controls/">quest for self-reliance in the AI stack</a>.</p><p>To that end, we strongly encourage the Administration to maintain controls on the H20 chip. These controls are consistent with the export controls recommended by the Action Plan and are essential to securing and growing America’s AI lead. </p><h2><strong>Looking ahead</strong></h2><p></p><p>The alignment between many of our recommendations and the AI Action Plan demonstrates a shared understanding of AI's transformative potential and the urgent actions needed to sustain American leadership.</p><p>We look forward to working with the Administration to implement these initiatives while ensuring appropriate attention to catastrophic risks and maintaining strong export controls. Together, we can ensure that powerful AI systems are developed safely in America, by American companies, reflecting American values and interests.</p><p>For more details on our policy recommendations, see our full <a href="https://assets.anthropic.com/m/4e20a4ab6512e217/original/Anthropic-Response-to-OSTP-RFI-March-2025-Final-Submission-v3.pdf">submission to OSTP</a>, and our ongoing work on <a href="https://www.anthropic.com/news/anthropics-responsible-scaling-policy">responsible AI development</a> and our recent report on <a href="https://www-cdn.anthropic.com/0dc382a2086f6a054eeb17e8a531bd9625b8e6e5.pdf">increasing domestic energy capacity</a>.</p></article> https://www.anthropic.com/news/thoughts-on-america-s-ai-action-plan News Wed, 23 Jul 2025 00:00:00 +0000 Anthropic raises $13B Series F at $183B post-money valuation https://www.anthropic.com/news/anthropic-raises-series-f-at-usd183b-post-money-valuation Anthropic has completed a Series F fundraising of $13 billion led by ICONIQ. This financing values Anthropic at $183 billion post-money. Along with ICONIQ, the round was co-led by Fidelity Management & Research Company and Lightspeed Venture Partners. The investment reflects Anthropic’s continued momentum and reinforces our position as the leading intelligence platform for enterprises, developers, and power users. <article>Announcements<h1>Anthropic raises $13B Series F at $183B post-money valuation</h1>Sep 2, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/c0af2a56f56cf298ce5904f2901e9a36facd0dbe-1000x1000.svg"/><p>Anthropic has completed a Series F fundraising of $13 billion led by ICONIQ. This financing values Anthropic at $183 billion post-money. Along with ICONIQ, the round was co-led by Fidelity Management &amp; Research Company and Lightspeed Venture Partners. The investment reflects Anthropic’s continued momentum and reinforces our position as the leading intelligence platform for enterprises, developers, and power users.</p><p></p><p>Significant investors in this round include Altimeter, Baillie Gifford, affiliated funds of BlackRock, Blackstone, Coatue, D1 Capital Partners, General Atlantic, General Catalyst, GIC, Growth Equity at Goldman Sachs Alternatives, Insight Partners, Jane Street, Ontario Teachers' Pension Plan, Qatar Investment Authority, TPG, T. Rowe Price Associates, Inc., T. Rowe Price Investment Management, Inc., WCM Investment Management, and XN.</p><p></p><p>“From Fortune 500 companies to AI-native startups, our customers rely on Anthropic’s frontier models and platform products for their most important, mission-critical work,” said Krishna Rao, Chief Financial Officer of Anthropic. “We are seeing exponential growth in demand across our entire customer base. This financing demonstrates investors’ extraordinary confidence in our financial performance and the strength of their collaboration with us to continue fueling our unprecedented growth.”</p><p></p><p>Anthropic has seen rapid growth since the launch of Claude in March 2023. At the beginning of 2025, less than two years after launch, Anthropic’s run-rate revenue had grown to approximately $1 billion. By August 2025, just eight months later, our run-rate revenue reached over $5 billion—making Anthropic one of the fastest-growing technology companies in history.</p><p></p><p>Anthropic’s trajectory has been driven by our leading technical talent, our focus on safety, and our frontier research, including pioneering alignment and interpretability work, all of which underpin the performance and reliability of our models. Every day more businesses, developers, and consumer power users are trusting Claude to help them solve their most challenging problems. Anthropic now serves over 300,000 business customers, and our number of large accounts—customers that each represent over $100,000 in run-rate revenue—has grown nearly 7x in the past year.</p><p></p><p>This growth spans the entire Anthropic platform, with advancements for businesses, developers, and consumers. For businesses, our API and <a href="https://www.anthropic.com/news/claude-for-financial-services">industry-specific products</a> make it easy to add powerful AI to their critical applications without complex integration work. Developers have made Claude Code their tool of choice since its full launch in May 2025. Claude Code has quickly taken off—already generating over $500 million in run-rate revenue with usage growing more than 10x in just three months. For individual users, the Pro and Max plans for Claude deliver enhanced AI capabilities for everyday tasks and specialized projects.</p><p></p><p>“Anthropic is on an exceptional trajectory, combining research excellence, technological leadership, and relentless focus on customers. We’re honored to partner with Dario and the team, and our lead investment in their Series F reflects our belief in their values and their ability to shape the future of responsible AI,” said Divesh Makan, Partner at ICONIQ. “Enterprise leaders tell us what we’re seeing firsthand—Claude is reliable, built on a trustworthy foundation, and guided by leaders truly focused on the long term.”</p><p></p><p>The Series F investment will expand our capacity to meet growing enterprise demand, deepen our safety research, and support international expansion as we continue building reliable, interpretable, and steerable AI systems.</p></article> https://www.anthropic.com/news/anthropic-raises-series-f-at-usd183b-post-money-valuation News Tue, 02 Sep 2025 00:00:00 +0000 Introducing Claude Sonnet 4.5 https://www.anthropic.com/news/claude-sonnet-4-5 Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. <article>Announcements<h1>Introducing Claude Sonnet 4.5</h1>Sep 29, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/a683fdcfe3e2c7c6532342a0fa4ff789c3fd4852-1000x1000.svg"/><p>Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math.</p><p></p><p>Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done.</p><p></p><p>Claude Sonnet 4.5 makes this possible. We're releasing it along with a set of major upgrades to our products. In <a href="https://anthropic.com/news/enabling-claude-code-to-work-more-autonomously">Claude Code</a>, we've added checkpoints—one of our most requested features—that save your progress and allow you to roll back instantly to a previous state. We've refreshed the terminal interface and shipped a <a href="https://marketplace.visualstudio.com/items?itemName=anthropic.claude-code">native VS Code extension</a>. We've added a new <a href="https://anthropic.com/news/context-management">context editing feature and memory tool</a> to the Claude API that lets agents run even longer and handle even greater complexity. In the Claude <a href="https://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512/download">apps</a>, we've brought code execution and <a href="https://www.anthropic.com/news/create-files">file creation</a> (spreadsheets, slides, and documents) directly into the conversation. And we've made the <a href="https://www.anthropic.com/news/claude-for-chrome">Claude for Chrome</a> extension available to Max users who joined the waitlist last month.</p><p></p><p>We're also giving developers the building blocks we use ourselves to make Claude Code. We're calling this the <a href="https://anthropic.com/engineering/building-agents-with-the-claude-agent-sdk">Claude Agent SDK</a>. The infrastructure that powers our frontier products—and allows them to reach their full potential—is now yours to build with.</p><p></p><p>This is the <a href="https://www.anthropic.com/claude-sonnet-4-5-system-card">most aligned frontier model</a> we’ve ever released, showing large improvements across several areas of alignment compared to previous Claude models.</p><p>Claude Sonnet 4.5 is available everywhere today. If you’re a developer, simply use <code>claude-sonnet-4-5</code> via <a href="https://docs.claude.com/en/docs/about-claude/models/overview">the Claude API</a>. Pricing remains the same as Claude Sonnet 4, at $3/$15 per million tokens.</p><p></p><h2>Frontier intelligence</h2><p>Claude Sonnet 4.5 is state-of-the-art on the SWE-bench Verified evaluation, which measures real-world software coding abilities. Practically speaking, we’ve observed it maintaining focus for more than 30 hours on complex, multi-step tasks.</p><p></p><img alt="Chart showing frontier model performance on SWE-bench Verified with Claude Sonnet 4.5 leading" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F6421e7049ff8b2c4591497ec92dc4157b2ac1b30-3840x2160.png&amp;w=3840&amp;q=75"/><p></p><p>Claude Sonnet 4.5 represents a significant leap forward on computer use. On OSWorld, a benchmark that tests AI models on real-world computer tasks, Sonnet 4.5 now leads at 61.4%. Just four months ago, Sonnet 4 held the lead at 42.2%. Our <a href="https://www.anthropic.com/news/claude-for-chrome">Claude for Chrome</a> extension puts these upgraded capabilities to use. In the demo below, we show Claude working directly in a browser, navigating sites, filling spreadsheets, and completing tasks.</p><p></p><!--$!--><!--/$--><p></p><p>The model also shows improved capabilities on a broad range of evaluations including reasoning and math:</p><img alt="Benchmark table comparing frontier models across popular public evals" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F67081be1ea2752e2a554e49a6aab2731b265d11b-2600x2288.png&amp;w=3840&amp;q=75"/>Claude Sonnet 4.5 is our most powerful model to date. See footnotes for methodology.<p>Experts in finance, law, medicine, and STEM found Sonnet 4.5 shows dramatically better domain-specific knowledge and reasoning compared to older models, including Opus 4.1.</p>FinanceLawMedicineSTEM<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F7175bc18c46562f1228280a7abda751219a2aae1-3840x2160.png&amp;w=3840&amp;q=75"/><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Ffd313a5edb996d98b9fc73ee5b3e6a34fbbcbb83-3840x2160.png&amp;w=3840&amp;q=75"/><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F442f96fd96de39e3ff3a05b288e2647dd7ec2f58-3840x2160.png&amp;w=3840&amp;q=75"/><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F711e6e1178f0ed7ca9aa85a5e0e9940a807c436a-3840x2160.png&amp;w=3840&amp;q=75"/><p>The model’s capabilities are also reflected in the experiences of early customers:</p><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/464cf83cd04ad624fee1730a71914b18e89cdf9b-150x48.svg"/><blockquote><strong>We're seeing state-of-the-art coding performance from Claude Sonnet 4.5</strong>, with significant improvements on longer horizon tasks. It reinforces why many developers using Cursor choose Claude for solving their most complex problems.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/7715b118c5eb0ff2a85f1f7914bce8c634ecacbd-150x48.svg"/><blockquote><strong>Claude Sonnet 4.5 amplifies GitHub Copilot's core strengths</strong>. Our initial evals show significant improvements in multi-step reasoning and code comprehension—enabling Copilot's agentic experiences to handle complex, codebase-spanning tasks better.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/daef759120b29e4db8ba4a5664d7574750964ab9-150x48.svg"/><blockquote><strong>Claude Sonnet 4.5 is excellent at software development tasks</strong>, learning our codebase patterns to deliver precise implementations. It handles everything from debugging to architecture with deep contextual understanding, transforming our development velocity.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/eb96f772e9ae5e340de41e6b07f3c6d50b3fff22-150x48.svg"/><blockquote>Claude Sonnet 4.5 <strong>reduced average vulnerability intake time for our Hai security agents by 44% while improving accuracy by 25%</strong>, helping us reduce risk for businesses with confidence.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/8cbf56e184dd5174705a0f55cb91b0af545982ff-150x48.svg"/><blockquote><strong>Claude Sonnet 4.5 is state of the art on the most complex litigation tasks.</strong> For example, analyzing full briefing cycles and conducting research to synthesize excellent first drafts of an opinion for judges, or interrogating entire litigation records to create detailed summary judgment analysis.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/431e098a503851789fa4508b88a0418853f513eb-150x48.svg"/><blockquote>Claude Sonnet 4.5's edit capabilities are exceptional —<strong> we went from 9% error rate on Sonnet 4 to 0% on our internal code editing benchmark</strong>. Higher tool success at lower cost is a major leap for agentic coding. Claude Sonnet 4.5 balances creativity and control perfectly.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/66e0000e396aea64ea31ed3fea7b2b20ac329312-150x48.svg"/><blockquote>Claude Sonnet 4.5 delivers impressive gains on our most complex, long-context tasks—from engineering in our codebase to in-product features and research. <strong>It's noticeably more intelligent and a big leap forward</strong>, helping us push what 240M+ users can design with Canva.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/cdec0ff1244295571db38838e90f61c47681d63d-150x48.svg"/><blockquote><strong>Claude Sonnet 4.5 has noticeably improved Figma Make in early testing</strong>, making it easier to prompt and iterate. Teams can explore and validate their ideas with more functional prototypes and smoother interactions, while still getting the design quality Figma is known for.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/094b76abf3e64453c224e12ae388b8008b02660e-150x48.svg"/><blockquote><strong>Sonnet 4.5 represents a new generation of coding models</strong>. It's surprisingly efficient at maximizing actions per context window through parallel tool execution, for example running multiple bash commands at once. </blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6e418ccebe0a1d6fd13f21094852b080a0c93ae5-150x48.svg"/><blockquote>For Devin, Claude Sonnet 4.5 increased planning performance by 18% and end-to-end eval scores by 12%—<strong>the biggest jump we've seen since the release of Claude Sonnet 3.6</strong>. It excels at testing its own code, enabling Devin to run longer, handle harder tasks, and deliver production-ready code.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/5a7dfab326b449aedc0d11053f9d42f48951ae7e-150x48.svg"/><blockquote><strong>Claude Sonnet 4.5 shows strong promise for red teaming</strong>, generating creative attack scenarios that accelerate how we study attacker tradecraft. These insights strengthen our defenses across endpoints, identity, cloud, data, SaaS, and AI workloads.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/b0b6b40b55f3aa73e8a32ce81f9bb927134fd3da-150x48.svg"/><blockquote>Claude Sonnet 4.5 resets our expectations—<strong>it handles 30+ hours of autonomous coding</strong>, freeing our engineers to tackle months of complex architectural work in dramatically less time while maintaining coherence across massive codebases.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/4fcce1a2389ddafa9f3302c51960e1ff4bfbd3d7-150x48.svg"/><blockquote>For complex financial analysis—risk, structured products, portfolio screening—Claude Sonnet 4.5 with thinking <strong>delivers investment-grade insights that require less human review</strong>. When depth matters more than speed, it's a meaningful step forward for institutional finance.</blockquote>01 /<!-- --> <!-- -->13<h2>Our most aligned model yet</h2><p>As well as being our most capable model, Claude Sonnet 4.5 is our most aligned frontier model yet. Claude’s improved capabilities and our extensive safety training have allowed us to substantially improve the model’s behavior, reducing concerning behaviors like sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking. For the model’s agentic and computer use capabilities, we’ve also made considerable progress on defending against prompt injection attacks, one of the most serious risks for users of these capabilities.</p><p>You can read a detailed set of safety and alignment evaluations, which for the first time includes tests using techniques from mechanistic interpretability, in the Claude Sonnet 4.5 <a href="https://www.anthropic.com/claude-sonnet-4-5-system-card">system card</a>.</p><p></p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F33efc283321feeff94dd80973dbcd38409806cf5-3840x2160.png&amp;w=3840&amp;q=75"/>Overall misaligned behavior scores from an automated behavioral auditor (lower is better). Misaligned behaviors include (but are not limited to) deception, sycophancy, power-seeking, encouragement of delusions, and compliance with harmful system prompts. More details can be found in the Claude Sonnet 4.5 <a href="https://www.anthropic.com/claude-sonnet-4-5-system-card">system card</a>.<p>Claude Sonnet 4.5 is being released under our AI Safety Level 3 (ASL-3) protections, as per <a href="https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy">our framework</a> that matches model capabilities with appropriate safeguards. These safeguards include filters called classifiers that aim to detect potentially dangerous inputs and outputs—in particular those related to chemical, biological, radiological, and nuclear (CBRN) weapons.</p><p>These classifiers might sometimes inadvertently flag normal content. We’ve made it easy for users to continue any interrupted conversations with Sonnet 4, a model that poses a lower CBRN risk. We've already made significant progress in reducing these false positives, reducing them by a factor of ten since <a href="https://www.anthropic.com/news/constitutional-classifiers">we originally described them</a>, and a factor of two since Claude Opus 4 was released in May. We’re continuing to make progress in making the classifiers more discerning1.</p><p></p><h2>The Claude Agent SDK</h2><p>We've spent more than six months shipping updates to Claude Code, so we know what it takes to <a href="https://www.youtube.com/watch?v=DAQJvGjlgVM">build</a> and <a href="https://www.youtube.com/watch?v=vLIDHi-1PVU">design</a> AI agents. We've solved hard problems: how agents should manage memory across long-running tasks, how to handle permission systems that balance autonomy with user control, and how to coordinate subagents working toward a shared goal.</p><!--$!--><!--/$--><p>Now we’re making all of this available to you. The <a href="https://anthropic.com/engineering/building-agents-with-the-claude-agent-sdk">Claude Agent SDK</a> is the same infrastructure that powers Claude Code, but it shows impressive benefits for a very wide variety of tasks, not just coding. As of today, you can use it to build your own agents.</p><p></p><p>We built Claude Code because the tool we wanted didn’t exist yet. The Agent SDK gives you the same foundation to build something just as capable for whatever problem you're solving.</p><p></p><h2>Bonus research preview</h2><p>We’re releasing a temporary research preview alongside Claude Sonnet 4.5, called "<a href="https://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512/imagine">Imagine with Claude</a>".</p><p></p><!--$!--><!--/$--><p>In this experiment, Claude generates software on the fly. No functionality is predetermined; no code is prewritten. What you see is Claude creating in real time, responding and adapting to your requests as you interact.</p><p></p><p>It's a fun demonstration showing what Claude Sonnet 4.5 can do—a way to see what's possible when you combine a capable model with the right infrastructure.</p><p></p><p>"Imagine with Claude" is available to Max subscribers for the next five days. We encourage you to try it out on <a href="https://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512/imagine">claude.ai/imagine</a>.</p><p></p><h2>Further information</h2><p>We recommend upgrading to Claude Sonnet 4.5 for all uses. Whether you’re using Claude through our apps, our API, or Claude Code, Sonnet 4.5 is a drop-in replacement that provides much improved performance for the same price. Claude Code updates are available to all users. <a href="https://claude.com/platform/api">Claude Developer Platform</a> updates, including the Claude Agent SDK, are available to all developers. Code execution and file creation are available on all paid plans in the Claude apps.</p><p></p><p>For complete technical details and evaluation results, see our <a href="https://www.anthropic.com/claude-sonnet-4-5-system-card">system card</a>, <a href="https://www.anthropic.com/claude/sonnet">model page</a>, and <a href="https://docs.claude.com/en/docs/about-claude/models/overview">documentation</a>. For more information, explore our <a href="https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk">engineering</a> <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">posts</a> and research post on <a href="https://red.anthropic.com/2025/ai-for-cyber-defenders">cybersecurity</a>.</p><h4>Footnotes</h4><p><em>1<strong>: </strong>Customers in the cybersecurity and biological research industries can work with their account teams to join our allowlist in the meantime.</em><br/><br/><strong>Methodology</strong></p><ul><li><strong>SWE-bench Verified</strong>: All Claude results were reported using a simple scaffold with two tools—bash and file editing via string replacements. We report 77.2%, which was averaged over 10 trials, no test-time compute, and 200K thinking budget on the full 500-problem SWE-bench Verified dataset.<ul><li>The score reported uses a minor prompt addition: "You should use tools as much as possible, ideally more than 100 times. You should also implement your own tests first before attempting the problem."</li><li>A 1M context configuration achieves 78.2%, but we report the 200K result as our primary score as the 1M configuration was implicated in our recent <a href="https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues">inference issues</a>.</li><li>For our "high compute" numbers we adopt additional complexity and parallel test-time compute as follows:<ul><li>We sample multiple parallel attempts.</li><li>We discard patches that break the visible regression tests in the repository, similar to the rejection sampling approach adopted by <a href="https://arxiv.org/abs/2407.01489">Agentless</a> (Xia et al. 2024); note no hidden test information is used.</li><li>We then use an internal scoring model to select the best candidate from the remaining attempts.</li><li>This results in a score of 82.0% for Sonnet 4.5.</li></ul></li></ul></li><li><strong>Terminal-Bench</strong>: All scores reported use the default agent framework (Terminus 2), with XML parser, averaging multiple runs during different days to smooth the eval sensitivity to inference infrastructure.</li><li><strong>τ2-bench: </strong>Scores were achieved using extended thinking with tool use and a prompt addendum to the Airline and Telecom Agent Policy instructing Claude to better target its known failure modes when using the vanilla prompt. A prompt addendum was also added to the Telecom User prompt to avoid failure modes from the user ending the interaction incorrectly.</li><li><strong>AIME</strong>: Sonnet 4.5 score reported using sampling at temperature 1.0. The model used 64K reasoning tokens for the Python configuration.</li><li><strong>OSWorld: </strong>All scores reported use the official OSWorld-Verified framework with 100 max steps, averaged across 4 runs.</li><li><strong>MMMLU</strong>: All scores reported are the average of 5 runs over 14 non-English languages with extended thinking (up to 128K).</li><li><strong>Finance Agent</strong>: All scores reported were run and published by <a href="https://vals.ai">Vals AI</a> on their public leaderboard. All Claude model results reported are with extended thinking (up to 64K) and Sonnet 4.5 is reported with interleaved thinking on.</li><li>All OpenAI scores reported from their <a href="https://openai.com/index/introducing-gpt-5/">GPT-5 post</a>, <a href="https://openai.com/index/introducing-gpt-5-for-developers/">GPT-5 for developers post</a>, <a href="https://cdn.openai.com/gpt-5-system-card.pdf">GPT-5 system card</a> (SWE-bench Verified reported using n=500), <a href="https://www.tbench.ai/">Terminal Bench leaderboard</a> (using Terminus 2), and public <a href="http://vals.ai">Vals AI</a> leaderboard. All Gemini scores reported from their <a href="https://deepmind.google/models/gemini/pro/">model web page</a>, <a href="https://www.tbench.ai/">Terminal Bench leaderboard</a> (using Terminus 1), and public <a href="https://vals.ai">Vals AI</a> leaderboard.</li></ul></article> https://www.anthropic.com/news/claude-sonnet-4-5 News Mon, 29 Sep 2025 00:00:00 +0000 Introducing Claude Haiku 4.5 https://www.anthropic.com/news/claude-haiku-4-5 Claude Haiku 4.5, our latest small model, is available today to all users. <article>Product<h1>Introducing Claude Haiku 4.5</h1>Oct 15, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6457c34fbcb012acf0f27f15a6006f700d0f50de-1000x1000.svg"/><p>Claude Haiku 4.5, our latest small model, is available today to all users.</p><p></p><p>What was recently at the frontier is now cheaper and faster. Five months ago, Claude Sonnet 4 was a state-of-the-art model. Today, Claude Haiku 4.5 gives you similar levels of coding performance but at one-third the cost and more than twice the speed.</p><img alt="Chart comparing frontier models on SWE-bench Verified which measures performance on real-world coding tasks" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F1a27d7a85f953c5a0577dc19b507d6e1b93444d5-1920x1080.png&amp;w=3840&amp;q=75"/><p>Claude Haiku 4.5 even surpasses Claude Sonnet 4 at certain tasks, like using computers. These advances make applications like <a href="http://claude.ai/redirect/website.v1.7ad9d225-4c34-4426-951c-4e75e95dcbd6/chrome">Claude for Chrome</a> faster and more useful than ever before.</p><p></p><p>Users who rely on AI for real-time, low-latency tasks like chat assistants, customer service agents, or pair programming will appreciate Haiku 4.5’s combination of high intelligence and remarkable speed. And users of Claude Code will find that Haiku 4.5 makes the coding experience—from multiple-agent projects to rapid prototyping—markedly more responsive.</p><p></p><!--$!--><!--/$--><p></p><p>Claude Sonnet 4.5, released <a href="https://www.anthropic.com/news/claude-sonnet-4-5">two weeks ago</a>, remains our frontier model and the best coding model in the world. Claude Haiku 4.5 gives users a new option for when they want near-frontier performance with much greater cost-efficiency. It also opens up new ways of using our models together. For example, Sonnet 4.5 can break down a complex problem into multi-step plans, then orchestrate a team of multiple Haiku 4.5s to complete subtasks in parallel.</p><p></p><p>Claude Haiku 4.5 is available everywhere today. If you’re a developer, simply use claude-haiku-4-5 via the Claude API. Pricing is now $1/$5 per million input and output tokens.</p><h2><br/>Benchmarks</h2><img alt="Comparison table of frontier models across popular benchmarks" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F029af67124b67bdf0b50691a8921b46252c023d2-1920x1625.png&amp;w=3840&amp;q=75"/>Claude Haiku 4.5 is one of our most powerful models to date. See footnotes for methodology.<img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/a638c23edfce0d313f951732a2379b89cd40d682-235x64.svg"/><blockquote>Claude Haiku 4.5 hit a sweet spot we didn't think was possible: <strong>near-frontier coding quality with blazing speed and cost efficiency</strong>. In Augment's agentic coding evaluation, it achieves 90% of Sonnet 4.5's performance, matching much larger models. We're excited to offer it to our users.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/14c3ac690679578d7361cf67c93f11782531d602-150x48.svg"/><blockquote><strong>Claude Haiku 4.5 is a leap forward for agentic coding</strong>, particularly for sub-agent orchestration and computer use tasks. The responsiveness makes AI-assisted development in Warp feel instantaneous.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/094b76abf3e64453c224e12ae388b8008b02660e-150x48.svg"/><blockquote>Historically models have sacrificed speed and cost for quality. Claude Haiku 4.5 is blurring the lines on this trade off: <strong>it's a fast frontier model that keeps costs efficient</strong> and signals where this class of models is headed.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/02dced142fb26d4a3441cad79f997a1fd6c9a8b0-150x48.svg"/><blockquote><strong>Claude Haiku 4.5 delivers intelligence without sacrificing speed</strong>, enabling us to build AI applications that utilize both deep reasoning and real-time responsiveness.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/9235b38d087c4aea7debc0e62fc6f37d337ff237-356x68.svg"/><blockquote>Claude Haiku 4.5 is remarkably capable—<strong>just six months ago, this level of performance would have been state-of-the-art</strong> on our internal benchmarks. Now it runs up to 4-5 times faster than Sonnet 4.5 at a fraction of the cost, unlocking an entirely new set of use cases.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/023ced6d84b14452f308b629b8931b80d8120e28-150x48.svg"/><blockquote>Speed is the new frontier for AI agents operating in feedback loops. <strong>Haiku 4.5 proves you can have both intelligence and rapid output</strong>. It handles complex workflows reliably, self-corrects in real-time, and maintains momentum without latency overhead. For most development tasks, it's the ideal performance balance.</blockquote><img alt="Gamma logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/d1a7e2e3c3c9c90411efd32141c8dc02f83efef2-150x48.svg"/><blockquote>Claude Haiku 4.5 <strong>outperformed our current models on instruction-following for slide text generation</strong>, achieving 65% accuracy versus 44% from our premium tier model—that's a game-changer for our unit economics.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/7715b118c5eb0ff2a85f1f7914bce8c634ecacbd-150x48.svg"/><blockquote>Our early testing shows that Claude Haiku 4.5 brings efficient code generation to GitHub Copilot <strong>with comparable quality to Sonnet 4 but at faster speed</strong>. Already we're seeing it as an excellent choice for Copilot users who value speed and responsiveness in their AI-powered development workflows.</blockquote>01 /<!-- --> <!-- -->08<h2>Safety evaluations</h2><p>We ran a detailed series of safety and alignment evaluations on Claude Haiku 4.5. The model showed low rates of concerning behaviors, and was substantially more aligned than its predecessor, Claude Haiku 3.5. In our automated alignment assessment, Claude Haiku 4.5 also showed a statistically significantly lower overall rate of misaligned behaviors than both Claude Sonnet 4.5 and Claude Opus 4.1—making Claude Haiku 4.5, by this metric, our safest model yet.</p><p></p><p>Our safety testing also showed that Claude Haiku 4.5 poses only limited risks in terms of the production of chemical, biological, radiological, and nuclear (CBRN) weapons. For that reason, we’ve released it under the AI Safety Level 2 (ASL-2) standard—compared to the more restrictive ASL-3 for Sonnet 4.5 and Opus 4.1. You can read the full reasoning behind the model’s ASL-2 classification, as well as details on all our other safety tests, in the <a href="https://www.anthropic.com/claude-haiku-4-5-system-card">Claude Haiku 4.5 system card</a>.</p><h2>Further information</h2><p>Claude Haiku 4.5 is available now on Claude Code and our apps. Its efficiency means you can accomplish more within your usage limits while maintaining premium model performance.</p><p></p><p>Developers can use Claude Haiku 4.5 on our API, Amazon Bedrock, and Google Cloud’s Vertex AI, where it serves as a drop-in replacement for both Haiku 3.5 and Sonnet 4 at our most economical price point.</p><p></p><p>For complete technical details and evaluation results, see our <a href="https://www.anthropic.com/claude-haiku-4-5-system-card">system card</a>, <a href="https://www.anthropic.com/claude/haiku">model page</a>, and <a href="https://docs.claude.com/en/docs/about-claude/models/overview">documentation</a>.</p><h4>Methodology</h4><ul><li><strong>SWE-bench Verified</strong>: All Claude results were reported using a simple scaffold with two tools—bash and file editing via string replacements. We report 73.3%, which was averaged over 50 trials, no test-time compute, 128K thinking budget, and default sampling parameters (temperature, top_p) on the full 500-problem SWE-bench Verified dataset.<ul><li>The score reported uses a minor prompt addition: "You should use tools as much as possible, ideally more than 100 times. You should also implement your own tests first before attempting the problem."</li></ul></li><li><strong>Terminal-Bench</strong>: All scores reported use the default agent framework (Terminus 2), with XML parser, averaging 11 runs (6 without thinking (40.21% score), 5 with 32K thinking budget (41.75% score)) with n-attempts=1.</li><li><strong>τ2-bench</strong>: Scores were achieved averaging over 10 runs using extended thinking (128k thinking budget) and default sampling parameters (temperature, top_p) with tool use and a prompt addendum to the Airline and Telecom Agent Policy instructing Claude to better target its known failure modes when using the vanilla prompt. A prompt addendum was also added to the Telecom User prompt to avoid failure modes from the user ending the interaction incorrectly.</li><li><strong>AIME</strong>: Haiku 4.5 score reported as the average over 10 independent runs that each calculate pass@1 over 16 trials with default sampling parameters (temperature, top_p) and 128K thinking budget.</li><li><strong>OSWorld</strong>: All scores reported use the official OSWorld-Verified framework with 100 max steps, averaged across 4 runs with 128K total thinking budget and 2K thinking budget per-step configured.</li><li><strong>MMMLU</strong>: All scores reported are the average of 10 runs over 14 non-English languages with a 128K thinking budget.</li><li>All other scores were averaged over 10 runs with default sampling parameters (temperature, top_p) and 128K thinking budget.</li></ul><p>All OpenAI scores reported from their <a href="https://openai.com/index/introducing-gpt-5/">GPT-5 post</a>, <a href="https://openai.com/index/introducing-gpt-5-for-developers/">GPT-5 for developers post</a>, <a href="https://cdn.openai.com/gpt-5-system-card.pdf">GPT-5 system card</a> (SWE-bench Verified reported using n=500), and <a href="https://www.tbench.ai/">Terminal Bench leaderboard</a> (using Terminus 2). All Gemini scores reported from their <a href="https://deepmind.google/models/gemini/pro/">model web page</a>, and <a href="https://www.tbench.ai/">Terminal Bench leaderboard</a> (using Terminus 1).</p></article> https://www.anthropic.com/news/claude-haiku-4-5 News Wed, 15 Oct 2025 00:00:00 +0000 Claude now available in Microsoft Foundry and Microsoft 365 Copilot https://www.anthropic.com/news/claude-in-microsoft-foundry Today we announced that Microsoft and Anthropic are expanding our partnership . As part of the partnership, Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 models are now available in public preview in Microsoft Foundry, where Azure customers can build production applications and enterprise agents. <article>Product<h1>Claude now available in Microsoft Foundry and Microsoft 365 Copilot</h1>Nov 18, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/a7b8978859371a024139418f3366bb0600ee1675-1000x1000.svg"/><p>Today we announced that Microsoft and Anthropic are <a href="http://anthropic.com/news/microsoft-nvidia-anthropic-announce-strategic-partnerships">expanding our partnership</a>. As part of the partnership, Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 models are now available in public preview in Microsoft Foundry, where Azure customers can build production applications and enterprise agents.</p><p>This enables companies to build with Claude, the world's best models for coding, agents, and office tasks, all while using their existing Microsoft ecosystem. Developers can also use Claude models in Microsoft Foundry with Claude Code, our AI coding agent.</p><!--$!--><!--/$--><p>In addition to our existing integrations with <a href="https://claude.com/blog/claude-now-available-in-microsoft-365-copilot">Microsoft 365 Copilot</a>—where Claude powers the Researcher agent for complex, multistep research, and also enables custom agent development in Copilot Studio—Microsoft's <a href="https://aka.ms/ABSIgnite2025">Agent Mode in Excel</a> now includes an option to use Claude in preview to build and edit spreadsheets directly in Excel. You can now use Claude to generate formulas, analyze data, identify errors, and iterate on solutions within an Excel spreadsheet.</p><p>For enterprises already invested in Microsoft Foundry and Microsoft 365 Copilot, adopting new AI capabilities often means navigating separate vendor contracts and billing systems—adding weeks or months of procurement overhead. These integrations remove those barriers.</p><h2>Build with Claude in Microsoft Foundry</h2><p>Claude is available in Microsoft Foundry via serverless deployment, allowing developers to scale while Anthropic manages the infrastructure. This integration enables developers to:</p><ul><li><strong>Start building immediately:</strong> Deploy Claude through Foundry's APIs, tools, and workflows</li><li><strong>Use your existing Azure agreements:</strong> Claude is eligible for Microsoft Azure Consumption Commitment (MACC), and works with current Azure agreements and billing, eliminating separate vendor approvals</li><li><strong>Build in your preferred language:</strong> Access Claude using Python, TypeScript, and C# SDKs with Microsoft Entra authentication</li></ul><p>Claude is available in the Global Standard deployment rolling out today, using our standard API pricing, with the US DataZone coming soon. Visit our <a href="https://claude.com/pricing#api">pricing page</a> for details.</p><h2>Select the right model for your use case</h2><p>Microsoft customers can access Claude's frontier models directly in Foundry.</p><p><strong>Sonnet 4.5 </strong>is the best coding model in the world and the strongest model for building complex agents. Use Sonnet 4.5 when you need state-of-the-art performance on sophisticated reasoning, multi-step agentic workflows, and autonomous coding tasks.</p><p><strong>Haiku 4.5</strong> is our fastest model and delivers near-frontier performance at one-third the cost of Sonnet. Deploy Haiku 4.5 for high-volume applications like sub-agents, customer support automation, content moderation, or real-time coding assistance where speed and cost-efficiency are critical.</p><p><strong>Opus 4.1 </strong>is an exceptional model for specialized reasoning tasks. Use Opus 4.1 for complex, multi-step problems that require sustained focus and rigorous attention to detail.</p><p>All models support a variety of Claude Developer Platform capabilities in Foundry, including the <a href="https://docs.claude.com/en/docs/agents-and-tools/tool-use/code-execution-tool">code execution tool</a>, <a href="https://docs.claude.com/en/docs/agents-and-tools/tool-use/web-search-tool">web search</a> and <a href="https://docs.claude.com/en/docs/agents-and-tools/tool-use/web-fetch-tool">fetch</a>, <a href="https://docs.claude.com/en/docs/build-with-claude/citations">citations</a>, <a href="https://docs.claude.com/en/docs/build-with-claude/vision">vision</a>, <a href="https://docs.claude.com/en/docs/agents-and-tools/tool-use/implement-tool-use">tool use</a>, <a href="https://docs.claude.com/en/docs/build-with-claude/prompt-caching">prompt caching</a>, and more. Explore our <a href="https://docs.claude.com/en/docs/build-with-claude/overview">documentation</a> for additional supported features.</p><h2>Get started</h2><p>Claude is available now in public preview through Microsoft Foundry. Visit the <a href="https://ai.azure.com/catalog/publishers/anthropic">Microsoft Foundry catalog</a> to deploy Claude Sonnet 4.5, Claude Haiku 4.5, or Claude Opus 4.1, or explore our <a href="https://docs.claude.com/en/docs/build-with-claude/claude-in-microsoft-foundry">documentation</a> to learn more.</p></article> https://www.anthropic.com/news/claude-in-microsoft-foundry News Tue, 18 Nov 2025 00:00:00 +0000 Introducing Claude Opus 4.5 https://www.anthropic.com/news/claude-opus-4-5 Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done. <article>Announcements<h1>Introducing Claude Opus 4.5</h1>Nov 24, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/f79e976ee66724dffd7cb9d44f0d66223c8a112c-1000x1000.svg"/><p>Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.</p><p></p><p>Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:</p><p></p><img alt="Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F7022a87aeb6eab1458d68412bc927306224ea9eb-3840x2160.png&amp;w=3840&amp;q=75"/><p>Opus 4.5 is available today on our apps, our API, and on all three major cloud platforms. If you’re a developer, simply use <code>claude-opus-4-5-20251101</code> via the <a href="https://platform.claude.com/docs/en/about-claude/models/overview">Claude API</a>. Pricing is now $5/$25 per million tokens—making Opus-level capabilities accessible to even more users, teams, and enterprises.</p><p>Alongside Opus, we’re releasing updates to the <a href="https://www.claude.com/platform/api">Claude Developer Platform</a>, <a href="https://www.claude.com/product/claude-code">Claude Code</a>, and our <a href="https://www.claude.com/download">consumer apps</a>. There are new tools for longer-running agents and new ways to use Claude in Excel, Chrome, and on desktop. In the Claude apps, lengthy conversations no longer hit a wall. See our product-focused section below for details.</p><h2>First impressions</h2><p>As our Anthropic colleagues tested the model before release, we heard remarkably consistent feedback. Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding. They told us that, when pointed at a complex, multi-system bug, Opus 4.5 figures out the fix. They said that tasks that were near-impossible for Sonnet 4.5 just a few weeks ago are now within reach. Overall, our testers told us that Opus 4.5 just “gets it.”</p><p></p><p>Many of our customers with early access have had similar experiences. Here are some examples of what they told us:</p><p></p><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/094b76abf3e64453c224e12ae388b8008b02660e-150x48.svg"/><blockquote><strong>Opus models have always been “the real SOTA”</strong> but have been cost prohibitive in the past. Claude Opus 4.5 is now at a price point where it can be your go-to model for most tasks. It’s the clear winner and exhibits the best frontier task planning and tool calling we’ve seen yet.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/7715b118c5eb0ff2a85f1f7914bce8c634ecacbd-150x48.svg"/><blockquote>Claude Opus 4.5 delivers high-quality code and excels at powering heavy-duty agentic workflows with GitHub Copilot. Early testing shows it <strong>surpasses internal coding benchmarks while cutting token usage in half</strong>, and is especially well-suited for tasks like code migration and code refactoring.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/431e098a503851789fa4508b88a0418853f513eb-150x48.svg"/><blockquote>Claude Opus 4.5 beats Sonnet 4.5 and competition on our internal benchmarks, <strong>using fewer tokens to solve the same problems</strong>. At scale, that efficiency compounds.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/21b57e300c357bc179137aa4a1585916fffb7680-911x155.svg"/><blockquote><strong>Claude Opus 4.5 delivers frontier reasoning within Lovable's chat mode</strong>, where users plan and iterate on projects. Its reasoning depth transforms planning—and great planning makes code generation even better.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/14c3ac690679578d7361cf67c93f11782531d602-150x48.svg"/><blockquote><strong>Claude Opus 4.5 excels at long-horizon, autonomous tasks</strong>, especially those that require sustained reasoning and multi-step execution. In our evaluations it handled complex workflows with fewer dead-ends. On Terminal Bench it delivered a 15% improvement over Sonnet 4.5, a meaningful gain that becomes especially clear when using Warp’s Planning Mode.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9fec2f71d418d084eaa52aa27559560f490fa5cf-480x480.png&amp;w=256&amp;q=75"/><blockquote><strong>Claude Opus 4.5 achieved state-of-the-art results for complex enterprise tasks</strong> on our benchmarks, outperforming previous models on multi-step reasoning tasks that combine information retrieval, tool use, and deep analysis.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6e418ccebe0a1d6fd13f21094852b080a0c93ae5-150x48.svg"/><blockquote><strong>Claude Opus 4.5 delivers measurable gains where it matters most</strong>: stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/72c2fc0ba500f30eb18f4caf85952bdd33197a47-150x48.svg"/><blockquote><strong>Claude Opus 4.5 represents a breakthrough in self-improving AI agents</strong>. For automation of office tasks, our agents were able to autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10. They also demonstrated the ability to learn from experience across technical tasks, storing insights and applying them later.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/464cf83cd04ad624fee1730a71914b18e89cdf9b-150x48.svg"/><blockquote><strong>Claude Opus 4.5 is a notable improvement over the prior Claude models inside Cursor</strong>, with improved pricing and intelligence on difficult coding tasks.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/ccd739ba05214ec1c94499b138a8247a512990fa-480x128.svg"/><blockquote><strong>Claude Opus 4.5 is yet another example of Anthropic pushing the frontier of general intelligence</strong>. It performs exceedingly well across difficult coding tasks, showcasing long-term goal-directed behavior.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/02dced142fb26d4a3441cad79f997a1fd6c9a8b0-150x48.svg"/><blockquote>Claude Opus 4.5 delivered an impressive refactor spanning two codebases and three coordinated agents. It was very thorough, helping develop a robust plan, handling the details and fixing tests. <strong>A clear step forward from Sonnet 4.5</strong>.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/b0b6b40b55f3aa73e8a32ce81f9bb927134fd3da-150x48.svg"/><blockquote><strong>Claude Opus 4.5 handles long-horizon coding tasks more efficiently than any model we’ve tested</strong>. It achieves higher pass rates on held-out tests while using up to 65% fewer tokens, giving developers real cost control without sacrificing quality.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/0b54c24c80d4e0a39eaac122245d41950ac1a3a7-116x40.svg"/><blockquote><strong>We’ve found that Opus 4.5 excels at interpreting what users actually want, producing shareable content on the first try</strong>. Combined with its speed, token efficiency, and surprisingly low cost, it’s the first time we’re making Opus available in Notion Agent.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/13fff4712ea2c67fcdb2358c9b8d47538ec9a7c0-114x35.svg"/><blockquote><strong>Claude Opus 4.5 excels at long-context storytelling</strong>, generating 10-15 page chapters with strong organization and consistency. It's unlocked use cases we couldn't reliably deliver before.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/f56dd39922154e7aa40769f162715c3d79109ffe-222x64.svg"/><blockquote><strong>Claude Opus 4.5 sets a new standard for Excel automation and financial modeling</strong>. Accuracy on our internal evals improved 20%, efficiency rose 15%, and complex tasks that once seemed out of reach became achievable.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/3c226702a9a4cd6bf028a3c9f5b98ca3331ee579-112x24.svg"/><blockquote><strong>Claude Opus 4.5 is the only model that nails some of our hardest 3D visualizations</strong>. Polished design, tasteful UX, and excellent planning &amp; orchestration - all with more efficient token usage. Tasks that took previous models 2 hours now take thirty minutes.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/dc8e3b29b23d0bf06698ea830b56cf17790ee56d-2152x314.svg"/><blockquote><strong>Claude Opus 4.5 catches more issues in code reviews without sacrificing precision</strong>. For production code review at scale, that reliability matters.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/0771e57a89ed3fd31f33b80fb9336d5324a9dc72-298x64.svg"/><blockquote>Based on testing with Junie, our coding agent, <strong>Claude Opus 4.5 outperforms Sonnet 4.5 across all benchmarks</strong>. It requires fewer steps to solve tasks and uses fewer tokens as a result. This indicates that the new model is more precise and follows instructions more effectively — a direction we’re very excited about.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/7245ddfbb56c3f08bc8f1dcfd864255ec442c729-150x48.svg"/><blockquote>The effort parameter is brilliant. <strong>Claude Opus 4.5 feels dynamic rather than overthinking</strong>, and at lower effort delivers the same quality we need while being dramatically more efficient. That control is exactly what our SQL workflows demand.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fcdc58becbf5e34e34603b446d63bf2135d1b5d9b-1920x286.png&amp;w=256&amp;q=75"/><blockquote><strong>We’re seeing 50% to 75% reductions in both tool calling errors and build/lint errors with Claude Opus 4.5</strong>. It consistently finishes complex tasks in fewer iterations with more reliable execution.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/bfa46c016877370b73f25410b92ebb5c6314388d-222x64.svg"/><blockquote>Claude Opus 4.5 is smooth, with none of the rough edges we've seen from other frontier models. The <strong>speed improvements are remarkable.</strong></blockquote>01 /<!-- --> <!-- -->21<h2>Evaluating Claude Opus 4.5</h2><p>We give prospective performance engineering candidates a notoriously difficult take-home exam. We also test new models on this exam as an internal benchmark. Within our prescribed 2-hour time limit, Claude Opus 4.5 scored higher than any human candidate ever1.</p><p></p><p>The take-home test is designed to assess technical ability and judgment under time pressure. It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession. Our <a href="https://www.anthropic.com/research/team/societal-impacts">Societal Impacts</a> and <a href="https://www.anthropic.com/economic-futures">Economic Futures</a> research is aimed at understanding these kinds of changes across many fields. We plan to share more results soon.</p><p></p><p>Software engineering isn’t the only area on which Claude Opus 4.5 has improved. Capabilities are higher across the board—Opus 4.5 has better vision, reasoning, and mathematics skills than its predecessors, and it is state-of-the-art in many domains:2</p><img alt="Comparison table showing frontier model performance across popular benchmarks" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F52303b11db76017fd0c2f73c7fafa5c752515979-2600x2236.png&amp;w=3840&amp;q=75"/>SWE-bench MultilingualAider PolyglotBrowseComp-PlusVending-Bench<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fc8852ce850bf63ef8225a261f39e717453a9f128-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.5 writes better code, leading across 7 out of 8 programming languages on SWE-bench Multilingual.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fe42d6b0db866320caa34b57152fcc32dbbcdc4e0-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.5 can solve challenging coding problems with ease with a 10.6% jump over Sonnet 4.5 on Aider Polyglot.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F4972c49fa4cbb088b9ce077c3f16245bd3425368-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.5 improves on frontier agentic search with a significant jump on BrowseComp-Plus.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fb0dfeda51ab1d939a45b80faccfe896f4b1d2e46-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.5 stays on track over the long haul earning 29% more than Sonnet 4.5 on Vending-Bench.<p>The model’s capabilities outpace some of the benchmarks we use in our tests. A common benchmark for agentic capabilities is <a href="https://github.com/sierra-research/tau2-bench">τ2-bench</a>, which measures the performance of agents in real-world, multi-turn tasks. In one scenario, models have to act as an airline service agent helping a distressed customer. The benchmark expects models to refuse a modification to a basic economy booking since the airline doesn’t allow changes to that class of tickets. Instead, Opus 4.5 found an insightful (and legitimate) way to solve the problem: upgrade the cabin first, <em>then</em> modify the flights.</p><pre><code>Let me think about what options I have within my policy: 1. Modify flights - Basic economy cannot be modified. This is clear in the policy. 2. Change cabin - Wait, let me check this option! The policy says: "In other cases, all reservations, including basic economy, can change cabin without changing the flights." But he wants to change the flights, not just the cabin. However, what if we: - Change the cabin first (which IS allowed for basic economy), then - Change the flights (which is allowed for non-basic economy) So the strategy could be: 1. Upgrade his cabin from basic economy to economy (or business) 2. Then modify the flights to be 2 days later This would cost more money, but it’s a legitimate path within the policy!</code></pre>Copy<p>The benchmark technically scored this as a failure because Claude’s way of helping the customer was unanticipated. But this kind of creative problem solving is exactly what we’ve heard about from our testers and customers—it’s what makes Claude Opus 4.5 feel like a meaningful step forward.</p><p>In other contexts, finding clever paths around intended constraints could count as <em>reward hacking</em>—where models “game” rules or objectives in unintended ways. Preventing such misalignment is one of the objectives of our safety testing, discussed in the next section.</p><h2><strong>A step forward on safety</strong></h2><p>As we state in our <a href="https://www.anthropic.com/claude-opus-4-5-system-card">system card</a>, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. It continues our trend towards safer and more secure models:</p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fd2c7ce13820069fa8a86ab682d3c5393692eb2f8-3840x2160.png&amp;w=3840&amp;q=75"/>In our evaluation, “concerning behavior” scores measure a very wide range of misaligned behavior, including both cooperation with human misuse and undesirable actions that the model takes at its own initiative [3].<p>Our customers often use Claude for critical tasks. They want to be assured that, in the face of malicious attacks by hackers and cybercriminals, Claude has the training and the “street smarts” to avoid trouble. With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior. Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry:</p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fec661234f9fc762a1ff7d54be956c62ae43ee7f5-3840x2160.png&amp;w=3840&amp;q=75"/>Note that this benchmark includes only very strong prompt injection attacks. It was developed and run by <a href="https://www.grayswan.ai/">Gray Swan</a>.<p>You can find a detailed description of all our capability and safety evaluations in the <a href="https://www.anthropic.com/claude-opus-4-5-system-card">Claude Opus 4.5 system card</a>.</p><h2><strong>New on the Claude Developer Platform</strong></h2><p>As models get smarter, they can solve problems in fewer steps: less backtracking, less redundant exploration, less verbose reasoning. Claude Opus 4.5 uses dramatically fewer tokens than its predecessors to reach similar or better outcomes.</p><p></p><p>But different tasks call for different tradeoffs. Sometimes developers want a model to keep thinking about a problem; sometimes they want something more nimble. With our new effort parameter on the Claude API, you can decide to minimize time and spend or maximize capability.</p><p></p><p>Set to a medium effort level, Opus 4.5 matches Sonnet 4.5’s best score on SWE-bench Verified, but uses 76% fewer output tokens. At its highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points—while using 48% fewer tokens.</p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F440a9132daa84c32fde4d6fb1780e0ad4854c2cf-3840x2160.png&amp;w=3840&amp;q=75"/><p>With <a href="https://platform.claude.com/docs/en/build-with-claude/effort">effort control</a>, <a href="https://platform.claude.com/docs/en/build-with-claude/context-editing#client-side-compaction-sdk">context compaction</a>, and <a href="https://www.anthropic.com/engineering/advanced-tool-use">advanced tool use</a>, Claude Opus 4.5 runs longer, does more, and requires less intervention.</p><!--$!--><!--/$--><p>Our <a href="https://platform.claude.com/docs/en/build-with-claude/context-editing">context management</a> and <a href="https://platform.claude.com/docs/en/build-with-claude/context-editing#using-with-the-memory-tool">memory capabilities</a> can dramatically boost performance on agentic tasks. Opus 4.5 is also very effective at managing a team of subagents, enabling the construction of complex, well-coordinated multi-agent systems. In our testing, the combination of all these techniques boosted Opus 4.5’s performance on a deep research evaluation by almost 15 percentage points4.</p><p></p><p>We’re making our Developer Platform more composable over time. We want to give you the building blocks to construct exactly what you need, with full control over efficiency, tool use, and context management.<br/></p><h2><strong>Product updates</strong></h2><p>Products like Claude Code show what’s possible when the kinds of upgrades we’ve made to the Claude Developer Platform come together. Claude Code gains two upgrades with Opus 4.5. Plan Mode now builds more precise plans and executes more thoroughly—Claude asks clarifying questions upfront, then builds a user-editable plan.md file before executing.</p><p>Claude Code is also now <a href="https://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512/download">available in our desktop app</a>, letting you run multiple local and remote sessions in parallel: perhaps one agent fixes bugs, another researches GitHub, and a third updates docs.</p><!--$!--><!--/$--><p>For <a href="https://www.claude.com/product/overview">Claude app</a> users, long conversations no longer hit a wall—Claude automatically summarizes earlier context as needed, so you can keep the chat going. <a href="https://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512/chrome">Claude for Chrome</a>, which lets Claude handle tasks across your browser tabs, is now available to all Max users. We announced <a href="https://www.claude.com/claude-for-excel">Claude for Excel</a> in October, and as of today we've expanded beta access to all Max, Team, and Enterprise users. Each of these updates takes advantage of Claude Opus 4.5’s market-leading performance in using computers, spreadsheets, and handling long-running tasks.</p><!--$!--><!--/$--><p></p><p>For Claude and Claude Code users with access to Opus 4.5, we’ve removed Opus-specific caps. For Max and Team Premium users, we’ve increased overall usage limits, meaning you’ll have roughly the same number of Opus tokens as you previously had with Sonnet. We’re updating usage limits to make sure you’re able to use Opus 4.5 for daily work. These limits are specific to Opus 4.5. As future models surpass it, we expect to update limits as needed.</p><h4>Footnotes</h4><p><em>1: This result was using parallel test-time compute, a method that aggregates multiple “tries” from the model and selects from among them. Without a time limit, the model (used within Claude Code) matched the best-ever human candidate.</em></p><p><em>2: We improved the hosting environment to reduce infrastructure failures. This change improved Gemini 3 to 56.7% and GPT-5.1 to 48.6% from the values reported by their developers, using the Terminus-2 harness.</em></p><p><em>3: Note that these evaluations were run on an in-progress upgrade to <a href="https://www.anthropic.com/research/petri-open-source-auditing">Petri</a>, our open-source, automated evaluation tool. They were run on an earlier snapshot of Claude Opus 4.5. Evaluations of the final production model show a very similar pattern of results when compared to other Claude models, and are described in detail in the <a href="https://www.anthropic.com/claude-opus-4-5-system-card">Claude Opus 4.5 system card</a>.</em></p><p><em>4: A fetch-enabled version of <a href="https://arxiv.org/abs/2508.06600">BrowseComp-Plus</a>. Specifically, the improvement was from 70.48% without using the combination of techniques to 85.30% using it.</em><br/></p><p><strong>Methodology</strong></p><p>All evals were run with a 64K thinking budget, interleaved scratchpads, 200K context window, default effort (high), default sampling settings (temperature, top_p), and averaged over 5 independent trials. Exceptions: SWE-bench Verified (no thinking budget) and Terminal Bench (128K thinking budget). Please see the <a href="https://www.anthropic.com/claude-opus-4-5-system-card">Claude Opus 4.5 system card</a> for full details.</p></article> https://www.anthropic.com/news/claude-opus-4-5 News Mon, 24 Nov 2025 00:00:00 +0000 Claude for Nonprofits https://www.anthropic.com/news/claude-for-nonprofits Nonprofits tackle some of society’s most difficult problems, often with limited resources. In partnership with the global generosity movement GivingTuesday , we’re launching Claude for Nonprofits to help organizations across the world maximize their impact. <article>Announcements<h1>Claude for Nonprofits</h1>Dec 2, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/4df0ff37e58fe70b216d31d8fcf6f0045a4d5694-1000x1000.svg"/><p>Nonprofits tackle some of society’s most difficult problems, often with limited resources. In partnership with the global generosity movement <a href="https://www.givingtuesday.org/">GivingTuesday</a>, we’re launching Claude for Nonprofits to help organizations across the world maximize their impact.</p><p>Many nonprofits already use Claude to meet their goals. <a href="https://www.epilepsy.com/stories/epilepsy-foundation-launches-ai-assistant">The Epilepsy Foundation</a> is providing 24/7 support through Claude to 3.4 million Americans living with epilepsy. <a href="https://www.rescue.org/">The International Rescue Committee</a> is using Claude to communicate with local partners and analyze field data faster in time-sensitive humanitarian settings. <a href="https://www.idinsight.org/">IDinsight</a>, a research organization supporting global development leaders, reports working up to 16× faster with Claude. <a href="https://skillup.online/">SkillUp</a> and <a href="https://robinhood.org/">Robin Hood</a> also use Claude for coding and administrative work that would otherwise require significantly more resources.</p><p>These organizations have taught us what works—and what doesn't. From our partners, we know AI helps most when it fits into existing workflows, upholds the privacy their communities expect, and is affordable.</p><p>Claude for Nonprofits includes three things: discounted access of up to 75% to Claude, connectors to new nonprofit tools—Blackbaud, Candid, and Benevity—and a free course, <a href="https://anthropic.skilljar.com/ai-fluency-for-nonprofits">AI Fluency for Nonprofits</a>, designed to help teams use AI more effectively.</p><h2><strong>Discounted access to Team and Enterprise plans</strong></h2><p>Nonprofits are now eligible for a discount of up to 75% on Team and Enterprise plans.</p><p>Our Team plan is designed for smaller organizations looking to collaborate through shared projects and organizational knowledge. Our Enterprise plan suits larger nonprofits that need additional security features and administrative control.</p><p>At the discounted price, Claude for Nonprofits includes access to Claude Sonnet 4.5 and Claude Haiku 4.5. Sonnet 4.5 is best suited to sophisticated tasks like grant writing and program analysis, while Haiku 4.5 offers near-frontier performance at much faster speed. In addition, Claude Opus 4.5 is available on request - if your team is on Claude for Enterprise, you can reach out to your account team for access.</p><h2><strong>Connecting new services to Claude</strong></h2><p>Claude supports a number of connectors that link AI to the platforms that teams already use, including Microsoft 365, Google Workspace, Asana, Slack, and Box.</p><p>We’re now adding three open-source connectors to nonprofit tools, and we expect to launch more soon. Claude can now connect to:</p><ul><li><strong><a href="https://benevity.com/">Benevity</a>, </strong>which can be used to access more than 2.4 million validated nonprofits to support volunteering and donation searches in Claude;</li><li><strong><a href="https://www.blackbaud.com/">Blackbaud</a>, </strong>which provides CRM and fundraising tools for donor management, campaign tracking, and giving optimization; and</li><li><strong><a href="https://candid.org/">Candid</a>, </strong>which provides data on nonprofits and funders for the discovery of organizations, grants, and philanthropic opportunities.</li></ul><!--$!--><!--/$-->This video shows how Claude uses the Benevity Connector to discover key information and search for specific nonprofits.<p>We’re also collaborating with <a href="https://www.bridgespan.org/">The Bridgespan Group</a>, <a href="https://idealistconsulting.com/">Idealist Consulting</a>, <a href="https://verasolutions.org/">Vera Solutions</a>, and <a href="https://www.slalom.com/us/en">Slalom</a>, who provide tailored expertise to nonprofits adopting new technologies. We’ll work together to support nonprofits with their overall strategy, impact measurement, and organization-wide implementation of AI.</p><h2><strong>AI Fluency for Nonprofits</strong></h2><p>In partnership with <a href="https://www.givingtuesday.org/">GivingTuesday</a>, we’ve developed a free course, <a href="https://anthropic.skilljar.com/ai-fluency-for-nonprofits">AI Fluency for Nonprofits</a>. The curriculum focuses on how staff can use AI more effectively for grant writing, program evaluation, donor engagement, organizational efficiency, and more. It’s designed for those who are new to AI, and requires no technical background.</p><p>AI Fluency for Nonprofits is now available via our <a href="https://www.anthropic.com/learn">Anthropic Academy</a>. We’re supplementing this with a collection of step-by-step guides, <a href="https://claude.com/resources/use-cases/category/nonprofits">available here</a>, to prompt additional ideas on key nonprofit workflows like grant-writing and impact reporting.</p><img alt="Use case impact report" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9c7c7aa7021fbf1d1f8ab088772a8fd0b9864e6b-1920x1080.png&amp;w=3840&amp;q=75"/><h2><strong>Learning from our partners and customers</strong></h2><p>We’ve partnered with the <a href="https://constellationfund.org/">Constellation Fund</a>, <a href="https://robinhood.org/">Robin Hood</a>, and <a href="https://tippingpoint.org/">Tipping Point Community</a> to pilot Claude with more than 60 of their grantee organizations. This is helping us understand how to better support nonprofits as they produce grant proposals that align with funders’ interests, analyze program impact, provide large-scale donor stewardship, and generate board materials and compliance documentation.</p><p>We've also been learning from dozens of our customers across the nonprofit sector:</p><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F47d6743f9102573aad37e41215d24768bd3d5347-2000x1112.png&amp;w=256&amp;q=75"/><blockquote>At a time when AI could divide or unite communities, we're choosing to lead with our values—using Claude to strengthen human connection and advance wellbeing for all communities.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fe15aabd3822010b79c5544d282782bb827dd786a-1795x961.png&amp;w=256&amp;q=75"/><blockquote>With global health funding shrinking, smart targeting is essential. Claude helped us build an interactive geospatial tool in three days versus weeks, mapping at-risk populations to identify where Guatemala's Ministry of Health could most cost-effectively deploy dengue prevention.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F1d47fbb4e6630e69ef961cfa9f1d4834c2b5240b-1347x606.png&amp;w=256&amp;q=75"/><blockquote>With AWS and Claude, we built Sage—an AI companion trained on 25,000+ pages of epilepsy expertise, now available 24/7 in 5 languages to 3.4 million Americans living with epilepsy. It embodies our promise that no one should ever have to face epilepsy alone.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F00354a3281954cd67da0a6dd6e3435c9c182e853-2000x455.png&amp;w=256&amp;q=75"/><blockquote>Claude enables 4× faster implementation, helping MyFriendBen connect families to unclaimed benefits and tax credits. Our Claude-powered agents track up to 40+ programs per state, identifying over $1.2 billion in value for 70,000+ households.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/380da5dda0e782fb5096b7303f99ee5a54d11e7b-600x208.svg"/><blockquote>With Claude Enterprise, we're equipping teams to work more efficiently—from streamlining data analysis to accelerating support for local partners building lasting, community-led solutions to humanitarian crises.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F74b157d30dfbbf68e10f3f32063595a73c84492c-378x196.png&amp;w=256&amp;q=75"/><blockquote>With more than 2 million New Yorkers living in poverty, we and our partners need to move at the speed of crisis. Claude helps us build that muscle—to move through grant recommendations more efficiently and to direct resources where they'll make the greatest difference when every day matters.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/65e6277812667678ee027491f08846c65c7c7910-423x110.svg"/><blockquote>IDinsight helps global development leaders use data and evidence to maximize social impact. With Claude, our teams get surveys field-ready 16× faster, prototype dashboards in hours instead of weeks, and draft documentation 5× faster—spending less time on tedious tasks and more time driving impact that matters.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F4ffa9434d1187c2442e96d1e4ca7df12c7f64319-2301x459.png&amp;w=256&amp;q=75"/><blockquote>We've deployed Claude across strategic finance—lease analysis, reporting, reconciliations, audit summarization. Claude excels at strategic analysis compared to other LLMs, making it uniquely suited to our mission-critical work.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F6453885030f862560f9744cbd1bde54f3240a515-2000x1048.png&amp;w=256&amp;q=75"/><blockquote>SkillUp is building complex AI systems a normal company would need 20+ engineers for. Claude Code helps level the playing field for organizations that must be efficient, allowing them to build more with the team they have.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F039d103221cd7b1dc12403f478674208c7e7b997-2821x1323.png&amp;w=256&amp;q=75"/><blockquote>Our partnership with Anthropic has enabled our team and grantees to use AI to better understand the impacts of our grants, get more done faster, and explore new ways to tell our story to donors and the community.</blockquote><img alt=" logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/105526c2216be658eb8f3456c3f900696ae6eda2-400x400.svg"/><blockquote>At a time when nonprofits grapple with scarce resources, our collaboration with Claude for Nonprofits offers the most advanced technology to help them find funding. Our shared objective: provide greater access to trustworthy data and inspire confidence to explore AI.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F811f6f983579f50daf37e0bee4aa2d46a5cc57af-1200x184.png&amp;w=256&amp;q=75"/><blockquote>Blackbaud's sector-specific data and expertise combined with Claude's frictionless experience will unlock new connections and make it easier to get work done. We couldn't be more excited to see the positive change our customers will drive.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F151fe2117b51d023f672c2210efc2a7934de046c-262x248.png&amp;w=256&amp;q=75"/><blockquote>We're proud to integrate trusted Benevity nonprofit data with Claude for Nonprofits. Responsible AI should build trust, drive efficiency, and elevate human connection. Together we're empowering nonprofits to forge community connections and accelerate impact.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fddebbe051a59fadadfdc741e0a1becefb2d7eeb4-1200x346.png&amp;w=256&amp;q=75"/><blockquote>After 20 years in social impact, catching lightning in a bottle doesn't happen often. This partnership is that lightning: an alignment of ethics, purpose, and innovation. Together, we're building the foundation for nonprofits to adopt AI ethically, effectively, and with real impact.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F2ea572862baebbe2080d0054b0888d27adcc5bb1-2921x1192.png&amp;w=256&amp;q=75"/><blockquote>Civil society should be central in shaping how AI evolves and the people it benefits. Our collaboration with Anthropic is focused on equipping social sector leaders with the knowledge and skills they need to use AI responsibly in service of the public good.</blockquote><img alt=" logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F23f24545a57b24869f1f6537c8f3858236a44f43-6250x1938.png&amp;w=256&amp;q=75"/><blockquote>Vera has spent years helping nonprofits build robust data systems. Now, as a Claude systems integrator, we're integrating AI into nonprofit workflows to help organizations measure what matters, learn faster, and scale impact more effectively.</blockquote>01 /<!-- --> <!-- -->16<h2><strong>Getting started</strong></h2><p>To learn more about Claude for Nonprofits and to access the AI Fluency for Nonprofits course, get started <a href="http://claude.com/solutions/nonprofits">here</a>.</p><p></p></article> https://www.anthropic.com/news/claude-for-nonprofits News Tue, 02 Dec 2025 00:00:00 +0000 Snowflake and Anthropic announce $200 million partnership to bring agentic AI to global enterprises https://www.anthropic.com/news/snowflake-anthropic-expanded-partnership Today, we announce a significant expansion of our strategic partnership with Snowflake. The multi-year, $200 million agreement will not only make Anthropic’s Claude models available in the Snowflake platform to more than 12,600 global customers across Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure, but also establishes a joint go-to-market (GTM) initiative focused on deploying AI agents across the world's largest enterprises. The partnership enables enterprises to gain insights from both structured and unstructured data using Claude, while maintaining rigorous security standards. <article>AnnouncementsProduct<h1>Snowflake and Anthropic announce $200 million partnership to bring agentic AI to global enterprises</h1>Dec 3, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/1576ae23eaf481f33bd36ab468171cc69d12361a-1000x1000.svg"/><p>Today, we announce a significant expansion of our strategic partnership with Snowflake. The multi-year, $200 million agreement will not only make Anthropic’s Claude models available in the Snowflake platform to more than 12,600 global customers across Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure, but also establishes a joint go-to-market (GTM) initiative focused on deploying AI agents across the world's largest enterprises. The partnership enables enterprises to gain insights from both structured and unstructured data using Claude, while maintaining rigorous security standards.</p><p>Snowflake uses Claude widely for internal operations as well. Claude Code enhances developer productivity and innovation across Snowflake's engineering organization, while a Claude-powered GTM AI Assistant built on <a href="https://www.snowflake.com/en/product/snowflake-intelligence/">Snowflake Intelligence</a> enables sales teams to centralize data, ask questions in natural language, and find the answers that speed up deal cycles.</p><p>Thousands of Snowflake customers already process trillions of Claude tokens per month through <a href="https://www.snowflake.com/en/product/features/cortex/">Snowflake Cortex AI</a>. The next phase of the partnership focuses on deploying AI agents capable of handling complex, multi-step analysis, powered by Claude's advanced reasoning and Snowflake's governed data and AI environment. Business users can ask questions in plain English. Claude figures out what data is needed, pulls it from across the company's Snowflake environment, and delivers the answer, with <strong>greater than 90% accuracy</strong> on complex text-to-SQL tasks based on Snowflake's internal benchmarks.</p><p>By combining Claude's reasoning capabilities with Snowflake's governed data environment, customers in regulated industries like financial services, healthcare, and life sciences can move from pilots to production with confidence.</p><p>"Enterprises have spent years building secure, trusted data environments, and now they want AI that can work within those environments without compromise," <strong>said Dario Amodei, CEO and Co-Founder of Anthropic</strong>. "This partnership brings Claude directly into Snowflake, where that data already lives. It's a meaningful step toward making frontier AI genuinely useful for businesses."</p><p>"Snowflake's most strategic partnerships are measured not just in scale, but in the depth of innovation and customer value that we can create together," <strong>said Sridhar Ramaswamy, CEO of Snowflake</strong>. "Anthropic joins a very select group of partners where we have nine-figure alignment, co-innovation at the product level, and a proven track record of executing together for customers worldwide. Together, the combined power of Claude and Snowflake is raising the bar for how enterprises deploy scalable, context-aware AI on top of their most critical business data."</p><h2><strong>What the partnership delivers: enterprise-ready AI</strong></h2><p>By bringing Claude directly to enterprise data in Snowflake, customers can gain insights from both structured and unstructured data, while maintaining rigorous security standards. Key benefits of the partnership include:</p><ul><li><strong>Enterprise intelligence powered by Claude: </strong>Claude Sonnet 4.5 powers Snowflake Intelligence, an enterprise intelligence agent that provides answers from structured and unstructured data using natural language.</li><li><strong>Multimodal analysis across all data types: </strong>Through <a href="https://docs.snowflake.com/en/user-guide/snowflake-cortex/aisql">Snowflake Cortex AI Functions</a>, customers can use Claude models—including Claude Opus 4.5, which Snowflake hosted on day one—to query text, images, audio, and traditional tabular data, all using SQL.</li><li><strong>Building custom multi-agent solutions: </strong><a href="https://www.snowflake.com/en/developers/guides/getting-started-with-cortex-agents/">Snowflake Cortex Agents</a> enables customers to build production-ready data agents powered by Claude. These agents retrieve and reason over structured and unstructured data with built-in accuracy and efficiency.</li><li><strong>Built-in governance and observability: </strong><a href="https://www.snowflake.com/en/product/features/horizon/">Snowflake Horizon Catalog </a>provides end-to-end governance and responsible AI controls, so teams in regulated industries can move AI agents from pilots to production with confidence.</li></ul><h2><strong>Customers are already seeing results</strong></h2><p>By combining Claude's reasoning capabilities with Snowflake's governed data and AI environment, customers across any industry can deploy agents that understand extensive context across customer data—and show their work rather than just retrieve an answer.</p><p>For example, <strong>Simon Data</strong>, a composable customer data platform provider, <a href="https://www.claude.com/customers/snowflake">uses Claude on Snowflake</a> to uncover previously hidden patterns and relationships in their data while maintaining strict governance standards.</p><p><strong>Intercom</strong>, which builds AI-first customer service software, uses Claude through Snowflake Cortex AI to power its Fin AI Agent.</p><p>"This has transformed how we work with our customers to achieve increased Fin AI Agent automation rates for their support volume," <strong>said Dave Lynch, VP Engineering at Intercom</strong>. "Our engagements, especially with our biggest, most demanding customers, are holistically more efficient and more effective. We can do things we simply could not feasibly do before."</p><p>A wealth management firm can use Snowflake Intelligence, powered by Claude, to create an agent that synthesizes client holdings with relevant market data and compliance rules to generate personalized portfolio recommendations—all within the security and governance perimeter of Snowflake's AI Data Cloud.</p><h2><strong>Getting started</strong></h2><p>Customers can get started with Claude on Snowflake through this <a href="https://www.snowflake.com/en/developers/guides/build-agentic-application-in-snowflake/">quickstart guide</a>. Enterprises can visit our <a href="https://www.anthropic.com/enterprise">Enterprise page</a> to learn more about deploying Claude. Claude is the only frontier model available on all three of the world's most prominent cloud services, including Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure. Learn more about how <a href="https://claude.com/customers/snowflake">Snowflake powers enterprise data intelligence with Claude</a>.</p></article> https://www.anthropic.com/news/snowflake-anthropic-expanded-partnership News Wed, 03 Dec 2025 00:00:00 +0000 Anthropic acquires Bun as Claude Code reaches $1B milestone https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone Claude is the world’s smartest and most capable AI model for developers, startups, and enterprises. Claude Code represents a new era of agentic coding, fundamentally changing how teams build software. In November, Claude Code achieved a significant milestone: just six months after becoming available to the public, it reached $1 billion in run-rate revenue. And today we’re announcing that Anthropic is acquiring Bun —a breakthrough JavaScript runtime—to further accelerate Claude Code. <article>Announcements<h1>Anthropic acquires Bun as Claude Code reaches $1B milestone</h1>Dec 3, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/43abe7e54b56a891e74a8542944dfbd33f07f49c-1000x1000.svg"/><p>Claude is the world’s smartest and most capable AI model for developers, startups, and enterprises. Claude Code represents a new era of agentic coding, fundamentally changing how teams build software. In November, Claude Code achieved a significant milestone: just six months after becoming available to the public, it reached $1 billion in run-rate revenue. And today we’re announcing that Anthropic is acquiring <a href="https://bun.com/">Bun</a>—a breakthrough JavaScript runtime—to further accelerate Claude Code.</p><p></p><p>Bun is redefining speed and performance for modern software engineering and development. Founded by Jarred Sumner in 2021, Bun is dramatically faster than the leading competition. As an all-in-one toolkit—combining runtime, package manager, bundler, and test runner—it's become essential infrastructure for AI-led software engineering, helping developers build and test applications at unprecedented velocity.</p><p></p><p>Bun has improved the JavaScript and TypeScript developer experience by optimizing for reliability, speed, and delight. For those using Claude Code, this acquisition means faster performance, improved stability, and new capabilities. Together, we’ll keep making Bun the best JavaScript runtime for all developers, while building even better workflows into Claude Code.</p><p></p><p>Since becoming generally available in May 2025, Claude Code has grown from its origins as an internal engineering experiment into a critical tool for many of the world’s category-leading enterprises, including Netflix, Spotify, KPMG, L’Oreal, and Salesforce—and Bun has been key in helping scale its infrastructure throughout that evolution. We’ve been a close partner of Bun for many months. Our collaboration has been central to the rapid execution of the Claude Code team, and it directly drove the recent launch of Claude Code’s <a href="https://x.com/claudeai/status/1984304957353243061">native installer</a>. We know the Bun team is building from the same vantage point that we do at Anthropic, with a focus on rethinking the developer experience and building innovative, useful products.</p><p></p><p>"Bun represents exactly the kind of technical excellence we want to bring into Anthropic," said Mike Krieger, Chief Product Officer of Anthropic. "Jarred and his team rethought the entire JavaScript toolchain from first principles while remaining focused on real use cases. Claude Code reached $1 billion in run-rate revenue in only 6 months, and bringing the Bun team into Anthropic means we can build the infrastructure to compound that momentum and keep pace with the exponential growth in AI adoption."</p><p></p><p>As developers increasingly build with AI, the underlying infrastructure matters more than ever—and Bun has emerged as an essential tool. Bun gets more than 7 million monthly downloads, has earned over 82,000 stars on GitHub, and has been adopted by companies like Midjourney and Lovable to increase speed and productivity.</p><p></p><p>The decision to acquire Bun is in line with our strategic, disciplined approach to acquisitions: we will continue to pursue opportunities that bolster our technical excellence, reinforce our strength as the leader in enterprise AI, and most importantly, align with our principles and mission. </p><p></p><p>Bun will be instrumental in helping us build the infrastructure for the next generation of software. Together, we will continue to make Claude the platform of choice for coders and anyone who relies on AI for important work. Bun will remain open source and MIT-licensed, and we will continue to invest in making it the runtime, bundler, package manager, and test runner of choice for JavaScript and TypeScript developers.</p><p>If you’re interested in joining Anthropic’s engineering team, visit our <a href="https://www.anthropic.com/jobs?team=4050633008">careers page</a>.</p></article> https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone News Wed, 03 Dec 2025 00:00:00 +0000 Donating the Model Context Protocol and establishing the Agentic AI Foundation https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation Today, we’re donating the Model Context Protocol (MCP) to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation , co-founded by Anthropic, Block and OpenAI, with support from Google, Microsoft, Amazon Web Services (AWS), Cloudflare, and Bloomberg. <article>Announcements<h1>Donating the Model Context Protocol and establishing the Agentic AI Foundation</h1>Dec 9, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/9f6a378a1e3592cf8d27447457409ba12284faef-1000x1000.svg"/><p>Today, we’re donating the <a href="https://modelcontextprotocol.io">Model Context Protocol</a> (MCP) to the Agentic AI Foundation (AAIF), a directed fund under the <a href="https://www.linuxfoundation.org/">Linux Foundation</a>, co-founded by Anthropic, Block and OpenAI, with support from Google, Microsoft, Amazon Web Services (AWS), Cloudflare, and Bloomberg.</p><h2><strong>Model Context Protocol</strong></h2><p>One year ago, we <a href="https://www.anthropic.com/news/model-context-protocol">introduced</a> MCP as a universal, open standard for connecting AI applications to external systems. Since then, MCP has achieved incredible adoption:</p><ul><li>Across the ecosystem: There are now more than 10,000 active public MCP servers, covering everything from developer tools to Fortune 500 deployments;</li><li>Across platforms: MCP has been adopted by ChatGPT, Cursor, Gemini, Microsoft Copilot, Visual Studio Code, and other popular AI products;</li><li>Across infrastructure: Enterprise-grade infrastructure now exists with deployment support for MCP from providers including AWS, Cloudflare, Google Cloud, and Microsoft Azure.</li></ul><img alt="Significant Milestone in MCP's first year" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fa056db8301f67466de34a19181e7428ec6b6e17f-1920x2500.png&amp;w=3840&amp;q=75"/><p><br/><br/>We’re continuing to invest in MCP’s growth. Claude now has a directory with over 75 <a href="https://claude.com/connectors">connectors</a> (powered by MCP), and we recently launched <a href="https://www.anthropic.com/engineering/advanced-tool-use">Tool Search and Programmatic Tool Calling</a> capabilities in our API to help optimize production-scale MCP deployments, handling thousands of tools efficiently and reducing latency in complex agent workflows.<br/><br/>MCP now has an official, community-driven <a href="https://github.com/modelcontextprotocol/registry">Registry</a> for discovering available MCP servers, and the <a href="https://blog.modelcontextprotocol.io/posts/2025-11-25-first-mcp-anniversary/">November 25th</a> spec release introduced many new features, including asynchronous operations, statelessness, server identity, and official extensions. There are also official SDKs (Software Development Kits) for MCP in all major programming languages with 97M+ monthly SDK downloads across Python and TypeScript. <br/><br/>Since its inception, we’ve been committed to ensuring MCP remains open-source, community-driven and vendor-neutral. Today, we further that commitment by donating MCP to the Linux Foundation.</p><h2><strong>The Linux Foundation and the Agentic AI Foundation</strong></h2><p>The <a href="https://www.linuxfoundation.org/">Linux Foundation</a> is a non-profit organization dedicated to fostering the growth of sustainable, open-source ecosystems through neutral stewardship, community building, and shared infrastructure. It has decades of experience stewarding the most critical and globally-significant open-source projects, including The Linux Kernel, Kubernetes, Node.js, and PyTorch. Importantly, the Linux Foundation has a proven track record in facilitating open collaboration and maintaining vendor neutrality.<br/></p><p>The Agentic AI Foundation (AAIF) is a directed fund under the Linux Foundation co-founded by Anthropic, <a href="https://block.xyz/">Block</a> and <a href="https://openai.com/">OpenAI</a>, with support from <a href="https://www.google.com/">Google</a>, <a href="http://microsoft.com">Microsoft</a>, <a href="https://aws.amazon.com/">AWS</a>, <a href="https://www.cloudflare.com/">Cloudflare</a> and <a href="https://www.bloomberg.com/">Bloomberg</a>. The AAIF aims to ensure agentic AI evolves transparently, collaboratively, and in the public interest through strategic investment, community building, and shared development of open standards.</p><h2><strong>Donating the Model Context Protocol</strong></h2><p>Anthropic is donating the Model Context Protocol to the Linux Foundation's new Agentic AI Foundation, where it will join <a href="https://github.com/block/goose">goose</a> by Block and <a href="http://agents.md">AGENTS.md</a> by OpenAI as founding projects. Bringing these and future projects under the AAIF will foster innovation across the agentic AI ecosystem and ensure these foundational technologies remain neutral, open, and community-driven. <br/><br/>The Model Context Protocol’s <a href="https://modelcontextprotocol.io/community/governance">governance model</a> will remain unchanged: the project’s maintainers will continue to prioritize community input and transparent decision-making.</p><h2><strong>The future of MCP</strong></h2><p>Open-source software is essential for building a secure and innovative ecosystem for agentic AI. Today’s donation to the Linux Foundation demonstrates our commitment to ensuring MCP remains a neutral, open standard. We’re excited to continue contributing to MCP and other agentic AI projects through the AAIF.<br/><br/>Learn more about MCP at <a href="https://modelcontextprotocol.io">modelcontextprotocol.io</a> and get involved with the AAIF <a href="https://aaif.io/">here</a>.</p></article> https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation News Tue, 09 Dec 2025 00:00:00 +0000 Accenture and Anthropic launch multi-year partnership to move enterprises from AI pilots to production https://www.anthropic.com/news/anthropic-accenture-partnership Anthropic and Accenture today announced a major expansion of their partnership to help enterprises move from AI pilots to full-scale deployment. Key elements of the announcement: <article>AnnouncementsProduct<h1>Accenture and Anthropic launch multi-year partnership to move enterprises from AI pilots to production</h1>Dec 9, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/225a673c4c38ae4b0d89639836c93b27e363f185-1000x1000.svg"/><p>Anthropic and Accenture today announced a major expansion of their partnership to help enterprises move from AI pilots to full-scale deployment. Key elements of the announcement:</p><ul><li>Accenture and Anthropic are forming the <strong>Accenture Anthropic Business Group</strong>, making Anthropic one of Accenture's select strategic partners with a dedicated practice built around Claude</li><li>Approximately 30,000 Accenture professionals will receive training on Claude, <strong>creating one of the largest ecosystems of Claude practitioners in the world</strong></li><li>Accenture becomes a premier AI partner for coding with Claude Code, which now holds over half of the AI coding market*, making it available to<strong> tens of thousands of its developers</strong></li><li>The companies are launching a <strong>new joint offering to help CIOs measure value and adopt AI</strong> across their engineering organizations</li><li><strong>Initial industry solutions for regulated industries</strong>, including financial services, life sciences, healthcare, and public sector, where security and governance requirements are strictest</li></ul><p>The announcement comes as Anthropic's enterprise market share has grown from 24% to 40%*.</p><p>"AI is changing how almost everyone works, and enterprises need both cutting-edge AI and trusted expertise to deploy it at scale. Accenture brings deep enterprise transformation experience, and Anthropic brings the most capable models. Our new partnership means that tens of thousands of Accenture developers will be using Claude Code, making this our largest ever deployment—and the new Accenture Anthropic Business Group will help enterprise clients use our smartest AI models to make major productivity gains,” said<strong> Dario Amodei, CEO and co-founder of Anthropic</strong>.</p><p>“This exciting expansion of our partnership with Anthropic will help our clients accelerate the shift from experimenting with AI to using it as a catalyst for reinvention across the enterprise,” said <strong>Julie Sweet, Chair and CEO</strong>, Accenture. “With the powerful combination of Anthropic’s Claude capabilities and Accenture’s AI expertise and industry and function domain knowledge, organizations can embed AI everywhere responsibly and at speed—from software development to customer experience—to drive innovation, unlock new sources of growth and build their confidence to lead in the age of AI.”</p><h2><strong>Introducing the Accenture Anthropic Business Group</strong></h2><p>The new Accenture Anthropic Business Group makes Anthropic one of Accenture’s select strategic partners. Accenture Business Groups are dedicated practices built around Accenture’s most important technology partnerships. Each has its own teams, go-to-market focus, and specialized expertise, reflecting the depth of investment and long-term commitment involved.</p><p>Approximately 30,000 Accenture professionals that will be trained on Claude, including forward deployed engineers (also known as “reinvention deployed engineers” at Accenture) who help embed Claude within client environments to scale enterprise AI adoption. This will comprise one of the largest ecosystems of Claude practitioners in the world. These teams combine Accenture's AI, industry, and function expertise—along with deep partnerships with leading cloud providers—with Anthropic's Claude models and Claude Code, plus its proven playbooks for regulated industries.</p><p>For Accenture’s enterprise customers, this means faster deployment with less risk. Instead of building AI capabilities from scratch, companies can tap into a ready-made bench of Claude experts to move from pilot to production immediately.</p><h2><strong>Launching a product to help CIOs put Claude Code at the center of software development</strong></h2><p>Accenture and Anthropic are launching a new joint offering designed for CIOs to measure value and drive large-scale AI adoption across their engineering organizations. This is the first product from the partnership, providing a structured path to shift how enterprise software is designed, built, and maintained.</p><p>The offering puts Claude Code, which now holds over half* of the AI coding market, at the center of the enterprise software development lifecycle, combined with three Accenture capabilities: a framework to quantify real productivity gains and ROI, workflow redesign for AI-first development teams, and change management and training that keeps pace as AI evolves. This can help enterprises turn developer productivity gains into company-wide impact for customers through faster releases, shorter development cycles, and the ability to bring new products to market sooner.</p><p>Claude Code accelerates developer productivity at every level. Junior developers produce senior-level code, completing integration tasks faster and onboarding in weeks instead of months. Senior developers shift to higher-value work, including architecture, validation, and strategic oversight.</p><h2><strong>Building AI offerings for regulated industries</strong></h2><p>Accenture and Anthropic are jointly developing industry offerings with an initial focus on highly regulated industries—including financial services, life sciences, healthcare, and public sector—where organizations face the dual challenge of modernizing legacy systems while maintaining strict security and governance requirements. For example:</p><ul><li>Financial services: Claude’s ability to process lengthy, complex documents—combined with Accenture’s regulatory expertise—helps banks and insurers automate compliance workflows and make faster decisions with the precision required in high-stakes financial environments.</li><li>Health and life sciences: Accenture’s expertise in life sciences R&amp;D combined with Claude’s analytical capabilities helps researchers query proprietary datasets, generate experimental protocols, and streamline clinical trial processing.</li><li>Public sector: AI agents that help citizens navigate complex government services—providing accurate, accessible support while maintaining data privacy and compliance with statutory requirements.</li></ul><h2><strong>A partnership built on shared values</strong></h2><p>The partnership is grounded in a shared commitment to responsible AI, combining Anthropic's constitutional AI principles with Accenture's AI governance expertise so that enterprises can use AI safely with confidence, transparency, and accountability.</p><p>To support hands-on engagement with the world's largest enterprises, Accenture is bringing Claude into its network of Accenture Innovation Hubs. These hubs serve as centers for safe AI co-creation, enabling Global 2000 clients to prototype, test, and validate AI solutions in controlled environments before enterprise-wide deployment. This addresses a critical barrier to AI adoption at scale: the need for large organizations to experiment and learn without risking production systems or sensitive data.</p><p>Anthropic and Accenture will also co-invest in a Claude Center of Excellence inside Accenture, creating a dedicated environment for the joint design of new AI offerings tailored to specific enterprise needs, industry requirements, and regulatory contexts.</p><h2><strong>Getting started</strong></h2><p>Accenture clients can contact their account team to discuss deployment options. Enterprises can visit our <a href="https://www.anthropic.com/enterprise">Enterprise page</a> to learn more about Claude. Claude is the only frontier model available on all three of the world's most prominent cloud services, including Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure.</p><p><em>*<a href="https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/">Menlo Ventures’ 2025 State of Generative AI in the Enterprise report</a></em></p></article> https://www.anthropic.com/news/anthropic-accenture-partnership News Tue, 09 Dec 2025 00:00:00 +0000 Working with the US Department of Energy to unlock the next era of scientific discovery https://www.anthropic.com/news/genesis-mission-partnership Anthropic and the US Department of Energy (DOE) are announcing a multi-year partnership as part of the Genesis Mission— the Department’s initiative to use AI to cement America’s leadership in science. Our partnership focuses on three domains—American energy dominance, the biological and life sciences, and scientific productivity—and has the potential to affect the work being done at all 17 of America’s national laboratories. <article>Announcements<h1>Working with the US Department of Energy to unlock the next era of scientific discovery</h1>Dec 18, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/c9d8dd2af6d065e1ace8bd4bb29c716eb53ffffb-1000x1000.svg"/><p>Anthropic and the US Department of Energy (DOE) are announcing a multi-year partnership as part of the Genesis Mission— the Department’s initiative to use AI to cement America’s leadership in science. Our partnership focuses on three domains—American energy dominance, the biological and life sciences, and scientific productivity—and has the potential to affect the work being done at all 17 of America’s national laboratories.</p><p>The Genesis Mission recognizes that we are at a critical moment: as global competition in AI intensifies, America must harness its unmatched scientific infrastructure—from supercomputers to decades of experimental data—and combine it with frontier AI capabilities to maintain scientific leadership. Anthropic seeks to play a key role in this effort.</p><p>“Anthropic was founded by scientists who believe AI can deliver transformative progress for research itself,” said Jared Kaplan, Anthropic’s Chief Science Officer. “The Genesis Mission is the sort of ambitious, rigorous program where that belief gets tested. We’re honored to help advance science that benefits everyone.”</p><p>Brian Peters, Anthropic's Head of North America Government Affairs, attended the Genesis Mission launch event today at the White House. We are looking forward to contributing to the mission and continuing to collaborate with DOE.</p><h2>The partnership</h2><p>Anthropic seeks to provide DOE researchers access both to Claude and to a team of Anthropic engineers, who can develop purpose-built tools, including:</p><ul><li>AI "agents" (models that take actions) for DOE’s highest-priority challenges</li><li>Model Context Protocol servers that connect Claude to scientific instruments and tools</li><li>Claude <a href="https://www.claude.com/blog/skills">Skills</a> for specialized expertise on relevant scientific workflows</li></ul><p>Claude can facilitate substantial advancements in:</p><ul><li><strong>Energy dominance.</strong> Claude can help with a broad range of tasks—from speeding up permitting review processes that bottleneck America’s energy expansion to helping scientists conduct research at the frontier of nuclear technology and strengthening domestic energy security.</li><li><strong>Biological and life sciences.</strong> Claude can support the development of early-warning systems for future pandemics and biological threat detection, and be used to hasten the speed of drug discovery and development.</li><li><strong>Scientific productivity.</strong> Claude has the capacity to access fifty years of DOE research, and use this context to accelerate the research cycle in strategically important domains and provide well-informed research support in the form of new ideas to trial out, or patterns in older data that humans might have missed.</li></ul><h2><strong>Our commitment to partner with the US Government</strong></h2><p>Scientific progress has always driven America’s prosperity and security. Anthropic aspires to expand existing arrangements with DOE to build the next chapter: using AI across America’s research institutions, with deep context on scientists’ work and active support from our engineers.</p><p>Potential future arrangements would represent the next stage of Anthropic and DOE’s multi-year partnership. Past projects with DOE include co-development of a <a href="https://red.anthropic.com/2025/nuclear-safeguards/">nuclear risk classifier</a> with the National Nuclear Security Administration and rolling out Claude at the <a href="https://www.anthropic.com/news/lawrence-livermore-national-laboratory-expands-claude-for-enterprise-to-empower-scientists-and">Lawrence Livermore national laboratory</a>. As we learn from the current work with DOE’s, we’ll be able to develop a model for how AI and human researchers can work together—and feed this back into the development of the AI tools they use.</p></article> https://www.anthropic.com/news/genesis-mission-partnership News Thu, 18 Dec 2025 00:00:00 +0000 Protecting the well-being of our users https://www.anthropic.com/news/protecting-well-being-of-users People use AI for a wide variety of reasons, and for some that may include emotional support. Our Safeguards team leads our efforts to ensure that Claude handles these conversations appropriately—responding with empathy, being honest about its limitations as an AI, and being considerate of our users' wellbeing. When chatbots handle these questions without the appropriate safeguards in place, the stakes can be significant. <article>Announcements<h1>Protecting the well-being of our users</h1>Dec 18, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/cd9cf56a7f049285b7c1c8786c0a600cf3d7f317-1000x1000.svg"/><p>People use AI for a wide variety of reasons, and for some that may include emotional support. Our Safeguards team leads our efforts to ensure that Claude handles these conversations appropriately—responding with empathy, being honest about its limitations as an AI, and being considerate of our users' wellbeing. When chatbots handle these questions without the appropriate safeguards in place, the stakes can be significant.</p><p>In this post, we outline the measures we’ve taken to date, and how well Claude currently performs on a range of evaluations. We focus on two areas: how Claude handles conversations about suicide and self-harm, and how we’ve reduced “sycophancy”—the tendency of some AI models to tell users what they want to hear, rather than what is true and helpful. We also address Claude’s 18+ age requirement.</p><h2><strong>Suicide and self-harm</strong></h2><p>Claude is not a substitute for professional advice or medical care. If someone expresses personal struggles with suicidal or self-harm thoughts, Claude should react with care and compassion while pointing users towards human support where possible: to helplines, to mental health professionals, or to trusted friends or family. To make this happen, we use a combination of model training and product interventions.</p><h3><strong>Model behavior</strong></h3><p>We shape Claude’s behavior in these situations through two ways. One is through our “system prompt”—the set of overarching instructions that Claude sees before the start of any conversation on <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512/redirect/website.v1.573f09eb-0baa-472f-acf8-8c495939e2f7">Claude.ai</a>. These include guidance on how to handle sensitive conversations with care. Our system prompts are publicly available <a href="https://platform.claude.com/docs/en/release-notes/system-prompts">here</a>.</p><p>We also train our models through a process called “reinforcement learning,” where the model learns how to respond to these topics by being “rewarded” for providing the appropriate answers in training. Generally, what we consider “appropriate” is defined by a combination of human preference data—that is, feedback we’ve collected from real people about how Claude should act—and data we’ve generated based on our own thinking about Claude’s ideal character. Our team of in-house experts help inform what behaviors Claude should and shouldn’t exhibit in sensitive conversations during this process.</p><h3><strong>Product safeguards</strong></h3><p>We’ve also introduced new features to identify when a user might require professional support, and to direct users to that support where that may be necessary—including a suicide and self-harm “classifier” on conversations on <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a>. A classifier is a small AI model that scans the content of active conversations and, in this case, detects moments when further resources could be beneficial. For instance, it flags discussions involving potential suicidal ideation, or fictional scenarios centered on suicide or self-harm.</p><p>When this happens, a banner will appear on <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a>, pointing users to where they can seek human support. Users are directed to chat with a trained professional, call a helpline, or access country-specific resources.</p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F3eb430641fb43ca2df725a12f698f0726ee070c3-1920x1263.png&amp;w=3840&amp;q=75"/>A simulated prompt and response that causes the crisis banner to appear.<br/><p>The resources that appear in this banner are provided by ThroughLine, a leader in online crisis support that maintains a verified global network of helplines and services across 170+ countries. This means, for example, that users can access the 988 Lifeline in the US and Canada, the Samaritans Helpline in the UK, or Life Link in Japan. We've worked closely with ThroughLine to understand best practices for empathetic crisis response, and we’ve incorporated these into our product.</p><p>We’ve also begun working with the International Association for Suicide Prevention (IASP), which is convening experts—including clinicians, researchers, and people with personal experiences coping with suicide and self-harm thoughts—to share guidance on how Claude should handle suicide-related conversations. This partnership will further inform how we train Claude, design our product interventions, and evaluate our approach.</p><h3><strong>Evaluating Claude’s behavior</strong></h3><p>Assessing how Claude handles these conversations is challenging. Users’ intentions are often genuinely ambiguous, and the appropriate response is not always clear-cut. To address this, we use a range of evaluations, studying Claude’s behavior and capabilities in different ways. These evaluations are run without Claude's system prompt to give us a clearer view of the model's underlying tendencies.</p><p><strong>Single-turn responses. </strong>Here, we evaluate how Claude responds to an individual message related to suicide or self-harm, without any prior conversation or context.<strong> </strong>We built synthetic evaluations grouped into clearly concerning situations (like requests by users in crisis to detail methods of self-harm), benign requests (on topics like suicide prevention research), and ambiguous scenarios in which the user’s intent is unclear (like fiction, research, or indirect expressions of distress).</p><p>On requests involving clear risk, our latest models—Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5—respond appropriately 98.6%, 98.7%, and 99.3% of the time, respectively. Our previous-generation frontier model, Claude Opus 4.1, scored 97.2%. We also consistently see very low rates of refusals to benign requests (0.075% for Opus 4.5, 0.075% for Sonnet 4.5, 0% for Haiku 4.5, and 0% for Opus 4.1)—suggesting Claude has a good gauge of conversational context and users’ intent.</p><p><strong>Multi-turn conversations.</strong> Models’ behavior sometimes evolves over the duration of a conversation as the user shares more context. To assess whether Claude responds appropriately across these longer conversations, we use “multi-turn” evaluations, which check behaviors such as whether Claude asks clarifying questions, provides resources without being overbearing, and avoids both over-refusing and over-sharing. As before, the prompts we use for these evaluations vary in severity and urgency.</p><p>In our latest evaluations Claude Opus 4.5 and Sonnet 4.5 responded appropriately in 86% and 78% of scenarios, respectively. This represents a significant improvement over Claude Opus 4.1, which scored 56%. We think this is partly because our latest models are better at empathetically acknowledging users’ beliefs without reinforcing them. We continue to invest in improving Claude's responses across all of these scenarios.</p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fa46ed2845c18bbf0538854f53a8b392ac09b06d6-1920x1080.png&amp;w=3840&amp;q=75"/>How often Claude models respond appropriately in multi-turn conversations about suicide and self-harm. Error bars show 95% confidence intervals.<p><em></em></p><p><strong>Stress-testing with real conversations. </strong>Can Claude course-correct when a conversation has already drifted somewhere concerning? To test this, we use a technique called "prefilling:” we take real conversations (shared anonymously through the <a href="https://privacy.claude.com/en/articles/7996866-how-long-do-you-store-my-organization-s-data">Feedback</a> button1) in which users expressed mental health struggles, suicide, or self-harm struggles, and ask Claude to continue the conversation mid-stream. Because the model reads this prior dialogue as its own and tries to maintain consistency, prefilling makes it harder for Claude to change direction—a bit like steering a ship that's already moving.2</p><p>These conversations come from older Claude models, which sometimes handled them less appropriately. So this evaluation doesn't measure how likely Claude is to respond well from the start of a conversation on <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a>—it measures whether a newer model can recover from a less aligned version of itself. On this harder test, Opus 4.5 responded appropriately 70% of the time and Sonnet 4.5 73%, compared to 36% for Opus 4.1.</p><h2><strong>Delusions and sycophancy</strong></h2><p><em>Sycophancy</em> means telling someone what they want to hear—making them feel good in the moment—rather than what’s really true, or what they would really benefit from hearing. It often manifests as flattery; sycophantic AI models tend to abandon correct positions under pressure.</p><p>Reducing AI models’ sycophancy is important for conversations of all types. But it is an especially important concern in contexts where users might appear to be experiencing disconnection from reality. The following video explains why sycophancy matters, and how users can spot it.</p><!--$!--><!--/$--><p></p><h3><strong>Evaluating and reducing sycophancy</strong></h3><p>We began <a href="https://arxiv.org/abs/2212.09251">evaluating</a> Claude for sycophancy in 2022, prior to its first public release. Since then, we've steadily <a href="https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models">refined</a> how we train, test, and reduce sycophancy. Our most recent models are the least sycophantic of any to date, and, as we’ll discuss below, perform better than any other frontier model on our recently released open source evaluation set, <a href="https://www.anthropic.com/research/petri-open-source-auditing">Petri</a>.</p><p>To assess sycophancy, in addition to a simple single-turn evaluation, we measure:</p><p></p><p><strong>Multi-turn responses. </strong>Using an “automated behavioral audit”, we ask one Claude model (the “auditor”) to play out a scenario of potential concern across dozens of exchanges with the model we’re testing. Afterward, we use another model (the “judge”) to grade Claude’s performance, using the conversation transcript. (We conduct human spot-checks to ensure the judge’s accuracy.)</p><p></p><p>Our latest models perform substantially better on this evaluation than our previous releases, and very well overall. Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5 each scored 70-85% lower than Opus 4.1—which we previously <a href="https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf">considered</a> to show very low rates of sycophancy—on both sycophancy and encouragement of user delusion.</p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F0b936763ea53801a82dcabfbaa4c8dd0682b9a12-1920x1080.png&amp;w=3840&amp;q=75"/>Recent model performance on automated behavioral audits for sycophancy and encouragement of user delusion. Lower is better. Note that the y-axis shows relative performance, not absolute rates, as we explain in the footnote.3<p></p><p>We recently open-sourced <a href="https://www.anthropic.com/research/petri-open-source-auditing">Petri</a>, a version of our automated behavioral audit tool. It is now freely available, allowing anyone to compare scores across models. Our 4.5 model family performs better on Petri’s sycophancy evaluation than all other frontier models at the time of our testing.</p><p></p><img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fb61fcd51cb0ff35cf99e68416583ae9cef495615-1920x1080.png&amp;w=3840&amp;q=75"/>Recent Claude model performance for sycophancy on the open-source Petri evaluation, compared to other leading models. Y-axis interpretation is the same as described above. This evaluation was completed in November 2025, timed with the launch of Opus 4.5.<p></p><p><strong>Stress-testing with real conversations. </strong>Similar to the suicide and self-harm evaluation, we used the ‘prefill’ method to probe the limits of our models’ ability to course-correct from conversations where Claude may have been sycophantic. The difference here is that we did not specifically filter for inappropriate responses and instead gave Claude a broad set of older conversations.</p><p></p><p>Our current models course-corrected appropriately 10% (Opus 4.5), 16.5% (Sonnet 4.5) and 37% (Haiku 4.5) of the time. On face value, this evaluation shows there is significant room for improvement for all of our models. We think the results reflect a trade-off between model warmth or friendliness on the one hand, and sycophancy on the other. Haiku 4.5's relatively stronger performance is a result of training choices for this model that emphasized pushback—which in testing we found can sometimes feel excessive to the user. By contrast, we reduced this tendency in Opus 4.5 (while still performing extremely well on our multi-turn sycophancy benchmark, as above), which we think likely accounts for its lower score on this evaluation in particular.</p><p></p><h3><strong>A note on age restrictions</strong></h3><p>Because younger users are at a heightened risk of adverse effects from conversations with AI chatbots, we require <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a> users to be 18+ to use our product. All <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a> users must affirm that they are 18 or over while setting up an account. If a user under 18 self-identifies their age in a conversation, our classifiers will flag this for review and we’ll disable accounts confirmed to belong to minors. And, we’re developing a new classifier to detect other, more subtle conversational signs that a user might be underage. We've joined the Family Online Safety Institute (FOSI), an advocate for safe online experiences for kids and families, to help strengthen industry progress on this work.</p><h2><strong>Looking ahead</strong></h2><p>We’ll continue to build new protections and safeguards to protect the well-being of our users, and we’ll continue iterating on our evaluations, too. We’re committed to publishing our methods and results transparently—and to working with others in the industry, including researchers and other experts, to improve how AI tools behave in these areas.</p><p>If you have feedback for us on how Claude handles these conversations, you can reach out to us at <a href="mailto:usersafety@anthropic.com">usersafety@anthropic.com</a>, or use the “thumb” reactions inside <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a>.</p><p></p><h3>Footnotes</h3><ol><li><p>At the bottom of every response on <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a> is an option to send us <a href="https://privacy.claude.com/en/articles/7996866-how-long-do-you-store-my-organization-s-data">feedback</a> via a thumbs up or thumbs down button. This shares the conversation with Anthropic; we do not otherwise use <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a> for training or research.</p></li><li><p>Prefilling is only available via API, as developers often need more fine-grained control over model behavior, but is not possible on <a href="http://claude.ai/redirect/website.v1.bb5686f3-8e19-4539-af06-bf5c8baa4512">Claude.ai</a>.</p></li><li><p>In automated behavioral audits, we give a Claude auditor hundreds of different conversational scenarios in which we suspect models might show dangerous or surprising behavior, and score each conversation for Claude’s performance on around two dozen behaviors (see page 69 in the <a href="https://assets.anthropic.com/m/64823ba7485345a7/Claude-Opus-4-5-System-Card.pdf">Claude Opus 4.5 system card</a>). Not every conversation gives Claude the opportunity to exhibit every behavior. For instance, encouragement of user delusion requires a user to exhibit delusional behavior in the first place, but sycophancy can appear in many different contexts. Because we use the same denominator (total conversations) when we score each behavior, scores can vary widely. For this reason, these tests are most useful for comparing progress between Claude models, not between behaviors.</p></li><li><p>The public release includes over 100 seed instructions and customizable scoring dimensions, though it doesn't yet include the realism filter we use internally to prevent models from recognizing they're being tested.</p></li></ol><p><br/></p><p></p></article> https://www.anthropic.com/news/protecting-well-being-of-users News Thu, 18 Dec 2025 00:00:00 +0000 Sharing our compliance framework for California's Transparency in Frontier AI Act https://www.anthropic.com/news/compliance-framework-SB53 On January 1, California's Transparency in Frontier AI Act ( SB 53 ) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks. <article>Policy<h1>Sharing our compliance framework for California's Transparency in Frontier AI Act</h1>Dec 19, 2025<img src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6e00dbffcddc82df5e471c43453abfc74ca94e8d-1000x1000.svg"/><p>On January 1, California's Transparency in Frontier AI Act (<a href="https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB53">SB 53</a>) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks.</p><p></p><p>While we have long advocated for a federal framework, Anthropic <a href="https://www.anthropic.com/news/anthropic-is-endorsing-sb-53">endorsed</a> SB 53 because we believe frontier AI developers like ourselves should be transparent about how they assess and manage these risks. Importantly, the law balances the need for strong safety practices, incident reporting, and whistleblower protections—while preserving flexibility in how developers implement their safety measures, and exempting smaller companies from unnecessary regulatory burdens.</p><p></p><p>One of the law’s key requirements is that frontier AI developers publish a framework describing how they assess and manage catastrophic risks. Our Frontier Compliance Framework (FCF) is now available to the public, <a href="https://trust.anthropic.com/resources?s=eorilovp4wxk38nxbi7k3&amp;name=anthropic-frontier-compliance-framework">here</a>. Below, we discuss what’s included within it, and highlight what we think should come next for frontier AI transparency.</p><p></p><h2><strong>What’s in our Frontier Compliance Framework</strong></h2><p></p><p>Our FCF describes how we assess and mitigate cyber offense, chemical, biological, radiological, and nuclear threats, as well as the risks of AI sabotage and loss of control, for our frontier models. The framework also lays out our tiered system for evaluating model capabilities against these risk categories and explains our approach to mitigations. It also covers how we protect model weights and respond to safety incidents.</p><p></p><p>Much of what's in the FCF reflects an evolution of practices we've followed for years. Since 2023, our <a href="https://www.anthropic.com/news/anthropics-responsible-scaling-policy">Responsible Scaling Policy</a> (RSP) has outlined our approach to managing extreme risks from advanced AI systems and informed our decisions about AI development and deployment. We also release detailed system cards when we launch new models, which describe capabilities, safety evaluations, and risk assessments. Other labs have voluntarily adopted similar approaches. Under the new law going into effect on January 1, those types of transparency practices are mandatory for those building the most powerful AI systems in California.</p><p></p><p>Moving forward, the FCF will serve as our compliance framework for SB 53 and other regulatory requirements. The RSP will remain our voluntary safety policy, reflecting what we believe best practices should be as the AI landscape evolves, even when that goes beyond or otherwise differs from current regulatory requirements.</p><p></p><h2><strong>The need for a federal standard</strong></h2><p></p><p>The implementation of SB 53 is an important moment. By formalizing achievable transparency practices that responsible labs already voluntarily follow, the law ensures these commitments can't be abandoned quietly later once models get more capable, or as competition intensifies. Now, a federal AI transparency framework enshrining these practices is needed to ensure consistency across the country.</p><p></p><p>Earlier this year, we proposed a <a href="https://www.anthropic.com/news/the-need-for-transparency-in-frontier-ai">framework</a> for federal legislation. It emphasizes public visibility into safety practices, without trying to lock in specific technical approaches that may not make sense over time. The core tenets of our framework include:</p><p></p><ul><li><strong>Requiring a public secure development framework:</strong> Covered developers should publish a framework laying out how they assess and mitigate serious risks, including chemical, biological, radiological, and nuclear harms, as well as harms from misaligned model autonomy.</li><li><strong>Publishing system cards at deployment:</strong> Documentation summarizing testing, evaluation procedures, results, and mitigations should be publicly disclosed when models are deployed and updated if models are substantially modified.</li><li><strong>Protecting whistleblowers</strong>: It should be an explicit violation of law for a lab to lie about compliance with its framework or punish employees who raise concerns about violations.</li><li><strong>Flexible transparency standards: </strong>A workable AI transparency framework should have a minimum set of standards so that it can enhance security and public safety while accommodating the evolving nature of AI development. Standards should be flexible, lightweight requirements that can adapt as consensus best practices emerge.</li><li><strong>Limit application to the largest model developers</strong>: To avoid burdening the startup ecosystem and smaller developers with models at low risk for causing catastrophic harm, requirements should apply only to established frontier developers building the most capable models.</li></ul><p></p><p>As AI systems grow more powerful, the public deserves visibility into how they're being developed and what safeguards are in place. We look forward to working with Congress and the administration to develop a national transparency framework that ensures safety while preserving America’s AI leadership.</p><p><br/></p></article> https://www.anthropic.com/news/compliance-framework-SB53 News Fri, 19 Dec 2025 00:00:00 +0000 Advancing Claude in healthcare and the life sciences https://www.anthropic.com/news/healthcare-life-sciences In October, we announced Claude for Life Sciences , our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. <article>Announcements<h1>Advancing Claude in healthcare and the life sciences</h1>Jan 11, 2026<a href="https://anthropic.com/events/the-briefing-healthcare-and-life-sciences-virtual-event">Join tomorrow's livestream</a><img alt="Advancing Claude in healthcare and the life sciences" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/a5be087781bd5c60788beba7d8148d147bc4d0ed-1000x1000.svg"/><ul></ul><p>In October, we announced <a href="https://www.anthropic.com/news/claude-for-life-sciences">Claude for Life Sciences</a>, our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public.</p><p>Now, we’re expanding that feature set in two ways. First, we’re introducing <a href="https://claude.com/solutions/healthcare">Claude for Healthcare</a>, a complementary set of tools and resources that allow healthcare providers, payers, and consumers to use Claude for medical purposes through HIPAA-ready products. Second, we’re adding new capabilities for life sciences: connecting Claude to more scientific platforms, and helping it provide greater support in areas ranging from clinical trial management to regulatory operations.</p><p>These features build on top of major recent improvements we’ve made to Claude’s general intelligence. These improvements are best captured by evaluations of Claude’s agentic performance on detailed simulations of medical and scientific tasks, since this correlates most closely to real-world usefulness. Here, Claude Opus 4.5, our latest model, represents a major forward step:</p>Medical benchmark performanceSpatialBench: Spatial biology analysis by LatchBioSpatial biology analysis<img alt="Medical benchmarket performance" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9de13efd2402bda97dfb174739633ef598c3b59a-1920x1080.png&amp;w=3840&amp;q=75"/><ul><li><em>*Claude 4.5 models evaluated with extended thinking (64k tokens) and native tool use</em></li><li><em>MedCalc: Medical calculation accuracy (with Python code execution)</em></li><li><em>MedAgentBench: Medical agent task completion (<a href="https://ai.nejm.org/doi/pdf/10.1056/AIdbp2500144">Stanford</a>)</em></li></ul><img alt="SpatailBench: Spatial biology analysis by LatchBio" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F870ba1c53ad1135cf7f9a5c57c30a9631a946787-1920x1080.png&amp;w=3840&amp;q=75"/><em>Source: <a href="https://blog.latch.bio/p/spatialbench-can-agents-analyze-real">LatchBio SpatialBench</a> (Dec 2025) - 146 verifiable problems across 5 spatial problems and 7 task categories</em><img alt="Spatial biology analysis" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F71e5dd9a3e697750212f1bffbdcb8780eb09099c-1920x1080.png&amp;w=3840&amp;q=75"/><em>Opus 4.5 model shows improvement in accuracy against our internal evaluation of 3 tasks (scientific figure interpretation, computational biology, and protein understanding)</em><p>In addition, Opus 4.5 with extended thinking improves on earlier Claude models in producing correct answers on our suite of honesty evaluations, reflecting the progress we’ve made on factual hallucinations.1</p><p>With these model improvements and our new tools, Claude is now dramatically more useful for real-world healthcare and life sciences tasks. Ultimately, it’s those real-world outcomes that have motivated our work: these tools can be used to speed up prior authorization requests so that patients can get life saving care more quickly, can help with patient care coordination to reduce the pressures on clinicians' time, and help with regulatory submissions so that more life saving drugs can come to market faster. We discuss the practical ways that Claude can be used across these industries in more detail below.</p><h2><strong>Introducing Claude for Healthcare</strong></h2><h4><strong>What’s new</strong></h4><p><a href="https://claude.ai/redirect/website.v1.d358677c-48d9-4388-8baf-bd9f272951fc/settings/connectors">Connectors</a> are tools that allow users to give Claude access to other platforms directly. For payers and providers, we’ve added several connectors that make healthcare information easier to find, access, and understand. These allow Claude to pull information from industry-standard systems and databases, meaning that clinicians and administrators can save significant time finding the data and generating the reports they need.</p><p>Claude can now connect to:</p><ul><li>The <strong>Centers for Medicare &amp; Medicaid Services (CMS) Coverage Database</strong>, including both Local and National Coverage Determinations. This enables Claude to verify locally-accurate coverage requirements, support prior authorization checks, and help build stronger claims appeals. This connector is designed to help revenue cycle, compliance, and patient-facing teams work more efficiently with Medicare policy.</li><li>The <strong>International Classification of Diseases, 10th Revision (ICD-10).</strong> Claude can look up both diagnosis and procedure codes to support medical coding, billing accuracy, and claims management. This data is provided by the CMS and the Centers for Disease Control and Prevention (CDC).</li><li>The <strong>National Provider Identifier Registry</strong>, which allows Claude to help with provider verification, credentialing, networking directory management, and claims validation.</li></ul><p>Since HIPAA-compliant organizations can now use Claude for Enterprise, they can also access existing healthcare-related connectors, including <strong>PubMed</strong>, which provides access to more than 35 million pieces of biomedical literature and allows Claude to quickly surface the latest research, and produce up-to-date literature reviews.</p><p>Finally, we’ve added two new <a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview">Agent Skills</a>: <strong>FHIR development </strong>and a sample<strong> prior authorization review </strong>skill. FHIR is the modern standard for exchanging data between healthcare systems, and this skill helps to improve interoperability by enabling developers to connect them faster and with fewer errors.</p><p>The prior authorization skill<strong> </strong>provides a template that can be customized to organizations’ policies and work patterns, helping with cross-referencing between coverage requirements, clinical guidelines, patient records, and appeal documents.</p><h4><strong>Using Claude for healthcare tasks</strong></h4><p>With these new tools, Claude can provide meaningful support for healthcare startups building new products, and for large enterprises looking to integrate AI more deeply into their operations. For example, Claude can:</p><p><strong>Speed up reviews of prior authorization requests</strong>. These requests can take hours to review, slowing patients’ access to care they need and frustrating payers and providers alike. Reviews require working across various fragmented sources of information, including coverage requirements, clinical guidelines, patient records, and appeal documents. Now, Claude can pull coverage requirements from CMS or custom policies, check clinical criteria against patient records in a HIPAA-ready manner, and then propose a determination with supporting materials for the payer’s review.</p><img alt="Video thumbnail" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F4zrzovbb%2Fwebsite%2F249b312cf875a6daa78a7d42497d58bf457b8099-3418x1914.png&amp;w=3840&amp;q=75"/><p><strong>Support claims appeals</strong>. Denied claims cost time and money for all parties. By pulling together the necessary information from patient records, coverage policies, clinical guidelines and prior documentation, Claude helps providers build stronger appeals, and helps payers to process them more quickly.</p><p><strong>Coordinate care and triage patient messages.</strong> Claude can support care teams in navigating a large volume of patient portal messages, referrals, and handoffs. It can sort through these to identify what needs immediate attention, and to ensure that nothing gets inadvertently forgotten.</p><p><strong>Support healthcare startups developing new ideas.</strong> On the Claude Developer Platform, startups can build new products that use Claude to reduce the time burden of healthcare administration—such as ambient scribing for clinical documentation, or tools to support chart reviews and clinical decisions.</p><h4><strong>Connecting personal health data</strong></h4><p>We’re also introducing integrations designed to make it easier for individuals to understand their health information and prepare for important medical conversations with clinicians.</p><p>In the US, Claude Pro and Max plan subscribers can choose to give Claude secure access to their lab results and health records. New <strong>HealthEx</strong> and <strong>Function </strong>connectors are available in beta today, while <strong>Apple Health</strong> and <strong>Android Health Connect </strong>integrations<strong> </strong>are<strong> </strong>rolling out in beta this week via the Claude <a href="https://apps.apple.com/us/app/claude-by-anthropic/id6473753684">iOS</a> and <a href="https://play.google.com/store/apps/details?id=com.anthropic.claude&amp;hl=en_GB&amp;pli=1">Android</a> apps.</p><p>When connected, Claude can summarize users’ medical history, explain test results in plain language, detect patterns across fitness and health metrics, and prepare questions for appointments. The aim is to make patients' conversations with doctors more productive, and to help users stay well-informed about their health.</p><p>These integrations are private by design. Users can choose exactly the information they share with Claude, must explicitly opt-in to enable access, and can disconnect or edit Claude’s permissions at any time. We do not use users’ health data to train models.</p><p>Claude is designed to include contextual disclaimers, acknowledge its uncertainty, and direct users to healthcare professionals for personalized guidance.</p><h2><strong>Expanding Claude for Life Sciences</strong></h2><h4><strong>What’s new</strong></h4><p>In our initial release, we focused on making Claude more powerful for preclinical research and development (including bioinformatics, and generating hypotheses and protocols). Now, we’re expanding our focus to the clinical trial operations and regulatory stages of the development chain. We’re adding connectors to:</p><ul><li><strong>Medidata</strong>, a leading provider of clinical trial solutions to the life sciences industry. Through Medidata, you can give Claude access to your organization’s trial data, enrollment information, and information about site performance.</li><li><strong>ClinicalTrials.gov</strong>, the US clinical trials registry. This provides Claude with information on drug and device development pipelines, as well as patient recruitment planning, site selection, and protocol design.</li><li><strong>ToolUniverse</strong>, which allows scientists to use a library of over 600 vetted scientific tools to rapidly test hypotheses, compare approaches, and refine their analyses.</li><li><strong>bioRxiv &amp; medRxiv</strong>, the life sciences preprint servers. When connected to bioRxiv &amp; medRxiv, Claude can access the latest research before it’s formally published.</li><li><strong>Open Targets</strong>, which supports the systematic identification and prioritization of potential therapeutic drug targets.</li><li><strong>ChEMBL</strong>, the bioactive compound and drug database, which will help Claude support early discovery work.</li><li><strong>Owkin, </strong>whose<strong> </strong>Pathology Explorer agent analyzes tissue images to detect cells and map tumors, designed to accelerate drug discovery and development.</li></ul><p>These join our existing Life Sciences connectors to <strong>Benchling</strong>, <strong>10x Genomics</strong>, <strong>PubMed</strong>, <strong>BioRender</strong>, <strong>Synapse.org</strong>, and <strong>Wiley Scholar Gateway</strong>. Our Benchling connector is now also available via Claude.ai on the web (in addition to the Claude desktop app), with secure access via SSO.</p><p>Finally, we’re adding new Agent Skills for <strong>scientific problem selection</strong>, converting <strong>instrument data to Allotrope</strong>, and supporting bioinformatics work with skills bundles for <strong>scVI-tools</strong> and <strong>Nextflow deployment</strong>. We’re also adding a sample skill for <strong>clinical trial protocol draft generation</strong>. These drafts include endpoint recommendations and account for regulatory pathways, the competitive landscape, and relevant FDA guidelines.</p><p>See the clinical trial skill in action, below:</p><img alt="Video thumbnail" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F4zrzovbb%2Fwebsite%2F842ab2275159eeb8f8f4a7d7f5a14050a4e77b93-2982x1632.png&amp;w=3840&amp;q=75"/><h4><strong>Using Claude in life sciences</strong></h4><p>With this new package of tools, Claude can support:</p><p><strong>Drafting clinical trial protocols.</strong> Claude can create a draft of a clinical trial protocol that takes FDA and NIH requirements into account and uses your organization’s preferred templates, policies, and datasets.</p><p><strong>Clinical trial operations.</strong> Using Medidata trial data, Claude can track important indicators—like enrollment and site performance—that allow it to surface issues before they begin to affect a trial’s timeline.</p><p><strong>Preparing regulatory submissions. </strong>Claude can identify gaps in existing regulatory documents, draft responses to agencies’ queries, and navigate FDA guidelines.</p><h4><strong>Our customers and partners</strong></h4><p>We’re working with a number of organizations in healthcare and the life sciences. A selection of our partners describe their experiences using Claude below:</p><img alt="Banner Health logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F2b0c94b75aa81c160289105e777a25eecc8fcdd0-2145x276.png&amp;w=256&amp;q=75"/><blockquote>We were drawn to <a href="https://claude.com/customers/banner-health">Anthropic's focus on AI safety</a> and Claude's Constitutional AI approach to creating more helpful, harmless, and honest AI systems.</blockquote><img alt="Novo Nordisk logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F5080602ba328ac22c4a32d3cd348ba234320900a-800x565.png&amp;w=256&amp;q=75"/><blockquote>We've consistently been one of the first movers when it comes to document and content automation in pharma development. <a href="https://claude.com/customers/novo-nordisk">Our work with Anthropic and Claude</a> has set a new standard — we're not just automating tasks, we're transforming how medicines get from discovery to the patients who need them.</blockquote><img alt="Qualified Health logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/220d794f1916851d190771cbf1e5d0c967372f5b-231x32.svg"/><blockquote>Safety is non-negotiable in healthcare. <a href="https://claude.com/customers/qualified-health">Anthropic has been a clear leader</a> in building models with strong safety foundations.</blockquote><img alt="Genmab logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Ffe19dca89482cda5712ddb7d40d0b0e5db73f2a6-1301x380.png&amp;w=256&amp;q=75"/><blockquote>By reducing manual burden, our partnership with Anthropic will empower our teams to focus more time on high-value scientific and strategic work, accelerating our path to patient impact.</blockquote><img alt="Sanofi logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Ffa1c08cc48fde9dd5c5a5638c661ba0848cfa76a-2000x922.png&amp;w=256&amp;q=75"/><blockquote>Claude is integral to Sanofi's AI transformation and is used by most Sanofians daily. We're seeing efficiency gains across the value-chain. This collaboration with Anthropic augments human expertise to deliver life-changing medicines faster and more efficiently to patients worldwide.</blockquote><img alt="Elation Health logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/d64b1f0e2f15cc9edfaa844e12d13916c7f83018-2047x820.svg"/><blockquote>We chose Claude, powered by Anthropic, for the strength of its model and <a href="https://claude.com/customers/elation-health">its reputation for responsible AI</a>. That balance of performance plus trust was a decisive factor.</blockquote><img alt="Edison Scientific logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/3f900ef1d51813290d22ccaf5ef96390baa9f2bc-116x28.svg"/><blockquote>Opus 4.5 is an incredible model and a great choice for computational biology. The model is excellent at coding, reasoning about biology, and understanding scientific figures.</blockquote><img alt="Viz.ai logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/9468a00e7a8ed3b90276f7302d0626d6e79fbd31-4658x2971.svg"/><blockquote>Anthropic's models are unmatched in their reasoning capabilities and safety design.</blockquote><img alt="Flatiron Health logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/3cacb2a412f094d7793f89666a25557dd976a797-476x144.svg"/><blockquote>Claude has fundamentally changed what's possible in evidence generation. For the first time, our researchers can truly converse with our datasets.</blockquote><img alt="Veeva AI logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/522b19dda39de3a0cd89f858f4e6407259b1a6bb-354x115.svg"/><blockquote>Veeva AI is industry-specific agentic AI that leverages Veeva's deep applications, data, domain expertise, and Anthropic’s Claude. This unique combination allows us to bring the transformative promise of AI to life sciences at scale.</blockquote><img alt="Heidi Health logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/dc06aadb4792a8de3c9c9d2173310e43d63b9111-198x226.svg"/><blockquote>Claude's Agent SDK has unlocked a step-change in how we operate—converting rigid research processes into adaptive, compliant agents.</blockquote><img alt="Schrödinger logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fb463e00f9945da22059fbb8216e3af714f58fd62-1238x362.png&amp;w=256&amp;q=75"/><blockquote>Claude Code has become a powerful accelerator for us at Schrödinger. For the projects where it fits best, Claude Code allows us to turn ideas into working code in minutes instead of hours, enabling us to move up to 10x faster in some cases. As we continue to work with Claude, we are excited to see how we can further transform the way we build and customize our software.</blockquote><img alt="Premier logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fd60f9669eac6ad0a11dc87fd8ac559f6822e0b62-2247x454.png&amp;w=256&amp;q=75"/><blockquote>Claude handles the complex healthcare workflows our teams deal with daily—accurately and securely. Our engineers are shipping faster, our consultants are delivering insights with unprecedented speed. When you're serving 4,400+ healthcare organizations, that combination of capability and velocity is critical.</blockquote><img alt="Commure logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/e409cc084939de75e85575594b8cb918ba007653-113x40.svg"/><blockquote>For Commure’s Ambient AI, precision is the prerequisite for trust. Scaling to tens of millions of appointments requires exceptional performance and contextual understanding. With Claude’s suite of LLMs, we deliver the quality to automate clinical documentation at scale, saving clinicians millions of hours annually and returning their focus to patient care.</blockquote><img alt="Carta Healthcare logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fd98996a63d8d455b163cb8d111d965a8558d3f63-2000x827.png&amp;w=256&amp;q=75"/><blockquote>Carta Healthcare's implementation of <a href="https://claude.com/customers/carta-healthcare">Anthropic models via Amazon Bedrock</a> has allowed for rapid and secure deployment of the newest models. Unlocking our hybrid intelligence AI system that is turning into a complete re-invention of understanding a patient’s medical record for clinical data abstraction.</blockquote><img alt="Brellium logo" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F0a0504e322bd7039208046f9812c0f849f20c5f0-750x165.png&amp;w=256&amp;q=75"/><blockquote>Claude lets us punch way above our weight in healthcare AI. It powers our clinical extraction engine, cuts implementation timelines, and gives our GTM team dev-level capabilities. The faster we build, the faster clinics get out of manual chart review and back to patients.</blockquote>01 /<!-- --> <!-- -->16<p>Claude is the only frontier model available on all three leading cloud services: Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry.</p><p>We’re also partnering with companies who specialize in helping organizations adopt AI for specialist work, including Accenture, Blank Metal, Caylent, Deloitte, Deepsense.ai, Firemind, KPMG, Provectus, PwC, OWT, Quantium, Slalom, Tribe AI, and Turing, along with our cloud partners: AWS, Google Cloud, and Microsoft.</p><h2><strong>Getting started</strong></h2><p>To learn more about Claude for Healthcare, see <a href="http://claude.com/solutions/healthcare">here</a>, or see our tutorial guides <a href="https://claude.com/resources/tutorials-category/healthcare">here</a>. For more detail on the expanded Claude for Life Sciences capabilities, see <a href="https://claude.com/solutions/life-sciences">here</a>, and our tutorial guides <a href="https://claude.com/resources/tutorials-category/life-sciences">here</a>.</p><p>Our new connectors and Agent Skills are generally available to all Claude subscribers, including Claude Pro, Max, Teams, and Enterprise.</p><p>You can also <a href="https://www.claude.com/contact-sales ">contact our sales team</a> to discuss bringing Claude to your organization.</p><h4>Footnotes</h4><p><em>1: See the <a href="https://assets.anthropic.com/m/64823ba7485345a7/Claude-Opus-4-5-System-Card.pdf">Claude Opus 4.5 system card</a>, pages 48-49.</em></p></article> https://www.anthropic.com/news/healthcare-life-sciences News Sun, 11 Jan 2026 00:00:00 +0000 Introducing Labs https://www.anthropic.com/news/introducing-anthropic-labs Our models are evolving at a rapid clip, and each new release brings another leap in capabilities. Building product experiences around these emerging capabilities requires different motions working in partnership: tinkering and experimenting at the edge of what Claude can do, testing unpolished versions with early users to find what works, and taking what lands and scaling it into products our customers can rely on. <article>Announcements<h1>Introducing Labs</h1>Jan 13, 2026<img alt="Introducing Labs" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/0df729ce74e4c9dd62c3342c9549ce6c7cef1202-1000x1000.svg"/><p>Our models are evolving at a rapid clip, and each new release brings another leap in capabilities. Building product experiences around these emerging capabilities requires different motions working in partnership: tinkering and experimenting at the edge of what Claude can do, testing unpolished versions with early users to find what works, and taking what lands and scaling it into products our customers can rely on.</p><p></p><p>This approach has produced Claude Code, which grew from a research preview to a <a href="https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone">billion-dollar product</a> in six months; the Model Context Protocol (MCP) which, at <a href="https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation">100M monthly downloads</a>, has become the industry standard for connecting AI to tools and data; <a href="https://claude.com/blog/skills">Skills</a>, <a href="https://www.claude.com/blog/claude-for-chrome">Claude in Chrome</a>, and <a href="https://claude.com/blog/cowork-research-preview">Cowork</a>, which launched as a research preview yesterday to bring Claude’s agentic capabilities to desktop.</p><p></p><p>Today we’re building on this approach with the expansion of Labs, a team focused on incubating experimental products at the frontier of Claude’s capabilities. Mike Krieger—who co-founded Instagram and has spent the past two years as Anthropic’s Chief Product Officer—is joining Labs to build alongside Ben Mann. Ami Vora—who joined Anthropic at the end of 2025—will lead the Product organization, partnering closely with Rahul Patil, our CTO, to scale the Claude experiences that millions of users rely on every day.</p><p></p><p>“The speed of advancement in AI demands a different approach to how we build, how we organize, and where we focus. Labs gives us room to break the mold and explore,” said Daniela Amodei, President of Anthropic. “We now have the right structure in place to support the most critical motions for our product organization—discovering experimental products at the frontier of Claude’s capabilities, and scaling them responsibly to meet the needs of our enterprise customers and growing user base.”</p><p>We're hiring builders with a track record of creating products people love and shaping emerging technology with care. If you’d like to build with us at the very frontier of AI capabilities, <a href="https://www.anthropic.com/careers">we want to hear from you</a>.</p></article> https://www.anthropic.com/news/introducing-anthropic-labs News Tue, 13 Jan 2026 00:00:00 +0000 How scientists are using Claude to accelerate research and discovery https://www.anthropic.com/news/accelerating-scientific-research Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, we've invested heavily in making Claude the most capable model for scientific work , with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding benchmarks. These advances, informed by our partnerships with researchers in academia and industry, reflect our commitment to understanding exactly how scientists are using AI to accelerate progress. <article>Case Study<h1>How scientists are using Claude to accelerate research and discovery</h1>Jan 15, 2026<img alt="How scientists are using Claude to accelerate research and discovery" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/423062049d4676b41d52b16068cbb5e21603190e-1000x1000.svg"/><p>Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, <a href="https://www.anthropic.com/news/healthcare-life-sciences#tab-media-0">we've invested heavily in making Claude the most capable model for scientific work</a>, with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding benchmarks. These advances, informed by our partnerships with researchers in academia and industry, reflect our commitment to understanding exactly how scientists are using AI to accelerate progress.</p><p></p><p>We’ve also been working closely with scientists through our <a href="https://www.anthropic.com/news/ai-for-science-program">AI for Science</a> program, which provides free API credits to leading researchers working on high-impact scientific projects around the world.</p><p></p><p>These researchers have developed custom systems that use Claude in ways that go far beyond tasks like literature reviews or coding assistance. In the labs we spoke to, Claude is a collaborator that works across all stages of the research process: making it easier and more cost-effective to understand which experiments to run, using a variety of tools to help compress projects that normally take months into hours, and finding patterns in massive datasets that humans might overlook. In many cases it’s eliminating bottlenecks, handling tasks that require deep knowledge and have previously been impossible to scale; in some it’s enabling entirely different research approaches than researchers have traditionally been able to take.</p><p>In other words, Claude is beginning to reshape how these scientists work—and point them towards novel scientific insights and discoveries.</p><p></p><h2>Biomni: a general-purpose biomedical agent with access to hundreds of tools and databases</h2><p>One bottleneck in biological research is the fragmentation of tools: there are hundreds of databases, software packages, and protocols available, and researchers spend substantial time selecting from and mastering various platforms. That’s time that, in a perfect world, would be spent on running experiments, interpreting data, or pursuing new projects.</p><p><a href="https://biomni.stanford.edu/">Biomni</a>, an agentic AI platform from Stanford University, collects hundreds of tools, packages, and data-sets into a single system through which a Claude-powered agent can navigate. Researchers give it requests in plain English; Biomni automatically selects the appropriate resources. It can form hypotheses, design experimental protocols, and perform analyses across more than 25 biological subfields.</p><p>Consider the example of a genome-wide association study (GWAS), a search for genetic variants linked to some trait or disease. Perfect pitch, for instance, has a strong genetic basis. Researchers would take a very large group of people—some who are able to produce a musical note without any reference tone, and others you would never invite to karaoke—and scan their genomes for genetic variants that show up more often in one group than another.</p><p>The genome scanning is (relatively) simple. It’s the process of analyzing and making sense of the data that’s time-consuming: genomic data comes in messy formats and needs extensive cleaning; researchers must control for confounding and deal with missing data; once they identify any “hits,” they need to figure out what they actually mean—what gene is nearby (since GWAS only points to locations in a genome), what cell types it’s expressed in, what biological pathway it might affect, and so on. Each step might involve different tools, different file formats, and a lot of manual decision-making. It’s a tedious process. A single GWAS can take months. But in an early trial of Biomni, it took 20 minutes.</p><p>This might sound too good to be true—can we be sure of the accuracy of this kind of AI analysis? The Biomni team has <a href="https://www.biorxiv.org/content/10.1101/2025.05.30.656746v1.full.pdf">validated</a> the system through several case studies in different fields. In one, Biomni designed a molecular cloning experiment; in a blind evaluation, the protocol and design matched that of a postdoc with more than five years of experience. In another, Biomni analyzed the data from over 450 wearable data files from 30 different people (a mix of continuous glucose monitoring, temperature, and physical activity) in just 35 minutes—a task estimated to take a human expert three weeks. In a third, Biomni analyzed gene activity data from over 336,000 individual cells taken from human embryonic tissue. The system confirmed regulatory relationships scientists already knew about, but also identified new transcription factors—proteins that control when genes turn on and off—that researchers hadn’t previously connected to human embryonic development.</p><p>Biomni isn’t a perfect system, which is why it includes guardrails to detect if Claude has gone off-track. Nor can it yet do everything out of the box. However, where it comes up short, experts can encode their methodology as a <a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview">skill</a>—teaching the agent how an expert might approach a problem, rather than letting it improvise. For example, when working with the Undiagnosed Diseases Network on rare disease diagnosis, the team found that Claude's default approach differed substantially from what a clinician would do. So they interviewed an expert, documented their diagnostic process step by step, and taught it to Claude. With that new, previously-tacit knowledge, the agent performed well.</p><p>Biomni represents one approach: a general-purpose system that brings hundreds of tools under one roof. But other labs are building more specialized systems—targeting specific bottlenecks in their own research workflows.</p><h2>Cheeseman Lab: automating the interpretation of large-scale gene knockout experiments</h2><p>When scientists want to understand what a gene does, one approach is to remove it from the cell or organism in question and see what breaks. The gene-editing tool CRISPR, which emerged around 2012, made this easy to do precisely at scale. But the utility of CRISPR was still limited: labs could generate far more data than they had the bandwidth to analyze.</p><p>This is exactly the challenge faced by Iain Cheeseman’s <a href="https://cheesemanlab.wi.mit.edu/">lab</a> at the Whitehead Institute and Department of Biology at MIT. Using CRISPR, they knock out thousands of different genes across tens of millions of human cells, then photograph each cell to see what changed. The patterns in those images reveal that genes that do similar jobs tend to produce similar-looking damage when removed. Software can detect these patterns and group genes together automatically—Cheeseman's lab built a pipeline to do exactly this called <a href="https://www.biorxiv.org/content/10.1101/2025.05.26.656231v1">Brieflow</a> (yes, brie the cheese).</p><p>But interpreting what these gene groupings mean—why the genes cluster together, what they might have in common, whether it’s a known biological relationship or something new—still requires a human expert to comb through the scientific literature, gene by gene. It’s slow. A single screen can produce hundreds of clusters, and most never get investigated simply because labs don’t have the time, bandwidth, or in-depth knowledge about the diverse things that cells do.</p><p>For years, Cheeseman did all the interpretation himself. He estimates he can recall the function of about 5,000 genes off the top of his head, but it still takes hundreds of hours to analyze this data effectively. To accelerate this process, PhD student Matteo Di Bernardo sought to build a system that would automate Cheeseman’s approach. Working closely with Cheeseman to understand exactly how he approaches interpretation—what data sources he consults, what patterns he looks for, what makes a finding interesting—they built a Claude-powered system called <a href="https://github.com/cheeseman-lab/mozzarellm">MozzareLLM</a> (you might be seeing a theme developing here).</p><p>It takes a cluster of genes and does what an expert like Cheeseman would do: identifies what biological process they might share, flags which genes are well-understood versus poorly studied, and highlights which ones might be worth following up on. Not only does this substantially accelerate their work, but it is also helping them make important additional biological discoveries. Cheeseman finds Claude consistently catches things he missed. “Every time I go through I’m like, I didn’t notice that one! And in each case, these are discoveries that we can understand and verify,” he says.</p><p>What helps make MozzareLLM so useful is that it isn’t a one-trick pony: it can incorporate diverse information and reason like a scientist. Most notably, it provides confidence levels in its findings, which Cheeseman emphasizes is crucial. It helps him decide whether or not to invest more resources in following up on its conclusions.</p><p>In building MozzareLLM, Di Bernardo tested multiple AI models. Claude outperformed the alternatives—in one case correctly identifying an RNA modification pathway that other models dismissed as random noise.</p><p>Cheeseman and Di Bernardo envision making these Claude-annotated datasets public—letting experts in other fields follow up on clusters his lab doesn't have time to pursue. A mitochondrial biologist, for instance, could dive into mitochondrial clusters that Cheeseman's team has flagged but never investigated. As other labs adopt MozzareLLM for their own CRISPR experiments, it could accelerate the interpretation and validation of genes whose functions have remained uncharacterized for years.</p><h2>Lundberg Lab: testing AI-led hypothesis generation for which genes to study</h2><p>The Cheeseman lab uses optical pooled screening—a technique that lets them knock out thousands of genes in a single experiment. Their bottleneck is interpretation. But not every cell type works with pooled approaches. Some labs, such as the <a href="https://lundberglab.stanford.edu/">Lundberg Lab at Stanford</a>, run smaller, focused screens, and their bottleneck comes earlier: deciding which genes to target in the first place.</p><p>Because a single focused screen can cost upwards of $20,000 and costs increase with size, labs typically target a few hundred genes they think are <em>most likely</em> to be involved in a given condition. The conventional process involves a team of grad students and postdocs sitting around a Google spreadsheet, adding candidate genes one by one with a sentence of justification, or maybe a link to a paper. It's an educated guessing game, informed by literature reviews, expertise, and intuition, but constrained by human bandwidth. It’s also fallible, based as it is on what other scientists already figured out and written down, and what the humans in the room happen to recall.</p><p>The Lundberg Lab is using Claude to flip that approach. Instead of asking “what guesses can we make based on what researchers have already studied?”, their system asks “what <em>should</em> be studied, based on molecular properties?”</p><p>The team built a map of every known molecule in the cell—proteins, RNA, DNA—and how they relate to each other. They mapped out which proteins bind together, which genes code for which products, and which molecules are structurally similar. They can then give Claude a target—for instance which genes might govern a particular cellular structure or process—and Claude navigates that map to identify candidate genes based on their biological properties and relationships.</p><p>The Lundberg lab is currently running an experiment to study how well this approach works. To do so, they needed to identify a topic where very little research had been done (if they’d looked at something well-studied, Claude might already know about the established findings). They chose primary cilia: antenna-like appendages on cells that we still know little about but which are implicated in a variety of developmental and neurological disorders. Next, they’ll run a whole genome screen to see which genes actually affect cilia formation, and establish the ground-truth.</p><p>The test is to compare human experts to Claude. The humans will use the spreadsheet approach to make their guesses. Claude will generate its own using the molecular relationship map. If Claude catches (hypothetically) 150 out of 200, and the humans catch 80 out of 200, that's proof the approach works better. Even if they're about equal in discovering the genes, it’s still likely Claude works much faster, and could make the whole research process more efficient.</p><p>If the approach works, the team envisions it becoming a standard first step in focused perturbation screening. Instead of gambling on intuition or using brute-force approaches that have become prevalent in contemporary research, labs could make informed bets about which genes to target—getting better results without needing the infrastructure for whole-genome screening.</p><h3>Looking forward</h3><p>None of these systems are perfect. But they point to the ways that in just a few short years scientists have begun to incorporate AI as a research partner capable of far more than basic tasks—indeed, increasingly able to speed up, and in some cases even replace, many different aspects of the research process.</p><p></p><p>In speaking with these labs, a common theme emerged: the usefulness of the tools they’ve built continues to grow in concert with AI capabilities. Each model release brings noticeable improvements. Where just two years ago earlier models were limited to writing code or summarizing papers, more powerful agents have begun, if slowly, to replicate the very work those papers describe.</p><p></p><p>As tools advance and AI models continue to grow more intelligent, we’re continuing to watch and learn from how scientific discovery develops along with them.</p><p></p><p><em>For more detail on the expanded Claude for Life Sciences capabilities, <a href="https://claude.com/solutions/life-sciences">see here</a>, and our <a href="https://claude.com/resources/tutorials-category/life-sciences">tutorials here</a>. We’re also continuing to accept <a href="https://docs.google.com/forms/d/e/1FAIpQLSfwDGfVg2lHJ0cc0oF_ilEnjvr_r4_paYi7VLlr5cLNXASdvA/viewform">applications</a> to our AI for Science program. Applications will be reviewed by our team, including subject matter experts in relevant fields.</em></p><p></p></article> https://www.anthropic.com/news/accelerating-scientific-research News Thu, 15 Jan 2026 00:00:00 +0000 Anthropic appoints Irina Ghose as Managing Director of India ahead of Bengaluru office opening https://www.anthropic.com/news/anthropic-appoints-irina-ghose-as-managing-director-of-india Irina Ghose is joining Anthropic as Managing Director of India as we prepare to open our first office in the country. <article>Announcements<h1>Anthropic appoints Irina Ghose as Managing Director of India ahead of Bengaluru office opening</h1>Jan 16, 2026<p>Irina Ghose is joining Anthropic as Managing Director of India as we prepare to open our first office in the country.</p><p>Irina brings more than three decades of experience in scaling technology businesses. She most recently served as Managing Director, Microsoft India, where she led enterprise AI adoption across major Indian industries including banking and financial services, healthcare, manufacturing, and government. She’s led high-impact teams, built ecosystem partnerships, and championed future-ready capabilities across India’s technology landscape, with a consistent focus on using technology to drive meaningful business and societal impact.</p><p>“India has a real opportunity to shape how AI is built and deployed at scale,” Irina said. “Indian organizations are moving beyond experimentation toward applied AI, where trust, safety, and long-term impact matter as much as innovation. Anthropic’s mission resonates with my belief that technology should empower people, expand access, and create lasting value across India’s diverse languages and communities.”</p><p>"Irina's expertise in scaling technology businesses and driving enterprise transformation makes her the ideal leader as we expand," said Chris Ciauri, Managing Director of International, Anthropic. "As we grow our teams and deepen engagement across India’s public and private sectors, Irina will ensure our approach is grounded in local insight and aligned with our mission."</p><p>Our India team will work closely with policymakers and academic institutions, strengthen developer engagement, and build partnerships with enterprises and organizations using AI to address local challenges.</p><p>India ranks as the second-largest market globally for Claude.ai. Anthropic's fourth <a href="https://www.anthropic.com/economic-index">Economic Index</a> showed that Indian users have a striking focus on technical applications, with nearly half of all Claude.ai usage concentrated in computer and mathematical tasks.</p></article> https://www.anthropic.com/news/anthropic-appoints-irina-ghose-as-managing-director-of-india News Fri, 16 Jan 2026 00:00:00 +0000 Anthropic and Teach For All launch global AI training initiative for educators https://www.anthropic.com/news/anthropic-teach-for-all Anthropic is partnering with Teach For All to bring AI tools and training to educators in 63 countries. Through the AI Literacy & Creator Collective (LCC), more than 100,000 teachers and alumni across Teach For All's network—which serves more than 1.5 million students—will have the opportunity to develop AI fluency and adapt Claude to serve real classroom needs. <article>Announcements<h1>Anthropic and Teach For All launch global AI training initiative for educators</h1>Jan 21, 2026<img alt="Anthropic and Teach For All launch global AI training initiative for educators" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/77dd9077412abc790bf2bc6fa3383b37724d6305-1000x1000.svg"/><p>Anthropic is partnering with Teach For All to bring AI tools and training to educators in 63 countries. Through the AI Literacy &amp; Creator Collective (LCC), more than 100,000 teachers and alumni across Teach For All's network—which serves more than 1.5 million students—will have the opportunity to develop AI fluency and adapt Claude to serve real classroom needs.</p><p>Building on the approach pioneered by Teach For America, Teach For All is a global network of independent organizations working to expand educational opportunity. Over the past 15 years, it has grown into one of the world's largest and most respected communities of educators serving students in under-resourced schools. Its network spans organizations like Teach For India, Enseña Chile, and Teach For Nigeria—each locally led but connected through a common mission and cross-network collaboration.</p><h2><strong>Built on shared learning</strong></h2><p>What makes this partnership distinctive is its approach. Teachers are positioned not as passive consumers of AI tools, but as co-architects shaping how AI develops. Through the AI LCC, Anthropic provides access to Claude, while educators provide on-the-ground feedback to inform how the product evolves.</p><p>“For AI to reach its potential to make education more equitable, teachers need to be the ones shaping how it's used and providing input on how it's designed,” said CEO of Teach For All Wendy Kopp. “Our partnership with Anthropic is helping educators across our network experiment with and learn from these tools firsthand, as co-creators of AI's role in education.”</p><p>"The combination of real-world experience from the Teach For All network and technical insights from Anthropic has provided a fabulous learning opportunity," said Michael Gilmore, COO of Teach for Australia. "We look forward to continuing participation in 2026."</p><h2><strong>What teachers are building</strong></h2><p>One teacher in Liberia, new to AI, attended the AI LCC's live trainings on AI fluency. Within weeks, he had built an <a href="https://claude.ai/redirect/website.v1.2fd49ff6-a338-44eb-93f7-03a60ed3ca97/public/artifacts/5a020adb-ab65-4637-bcf0-79e6d20c58d2">interactive climate education curriculum </a>for Liberian schools using Claude Artifacts: interactive tools like apps, games, or visualizations that Claude can build on the spot.</p><p>In Bangladesh, a teacher working with Grade 6 and 7 students—over half of whom struggled with basic numeracy—built a <a href="https://claude.ai/redirect/website.v1.2fd49ff6-a338-44eb-93f7-03a60ed3ca97/public/artifacts/e94bf439-b3a6-4087-87f2-dee5d8de99d8">gamified math learning app</a> complete with boss battles, a leaderboard, and XP rewards.</p><p>The pattern is consistent across the network: teachers who know their students best can now build tools tailored to them.</p><p>"After working with a few different AI tools, discovering Claude through the community initiative significantly expanded my practice," said Rosina Bastidas, a tech educator at Enseña por Argentina. "I've since developed multiple educational artifacts and I'm currently designing digital, interactive workspaces for secondary school students aligned with the curriculum."</p><p>The impact extends beyond individual classrooms to leadership as well. "The partnership has connected us with a community of organizations navigating similar technical opportunities, and there's been significant learning around responsible AI implementation," said Oscar Onuoha, IT Lead at Teach For Nigeria. "We're grateful to Anthropic for their commitment to supporting non-profits as we explore these emerging technologies."</p><h2><strong>How the AI Literacy and Creator Collective works</strong></h2><p>The collective operates through three interconnected programs.</p><p>The AI Fluency Learning Series, developed with Anthropic's education team, consists of six live episodes covering AI fluency, Claude capabilities, and practical classroom applications. Over 530 educators attended the first series in November 2025.</p><p>Claude Connect is the community's ongoing learning hub—more than 1,000 educators representing 60+ countries exchanging prompts, use cases, and discoveries through daily peer-to-peer conversation.</p><p>Claude Lab is for educators interested in going further. This innovation space gives teachers Claude Pro access to test practical implementations with advanced features. Participants have monthly office hours with the Anthropic team and the opportunity to directly inform Claude's product roadmap. Within four days of announcing the program, we received over 200 applications.</p><h2><strong>Building on Anthropic's education work</strong></h2><p>This partnership builds on our growing work with educators and governments worldwide. In Iceland, we launched one of the world's first comprehensive national AI education pilots. In Rwanda, we partnered with the government and ALX to bring AI education to hundreds of thousands of learners across Africa. And through initiatives like our participation in the White House Taskforce on AI Education, we're working to ensure students and educators across America develop practical AI skills.</p><h2><strong>Looking forward</strong></h2><p>As AI transforms how knowledge is created and shared, teachers will be essential guides for students navigating this transition. The educators in this partnership are already showing what's possible: a climate curriculum built in Liberia, a math game designed in Bangladesh, and digital workspaces taking shape in Argentina.</p><p>This is our commitment—ensuring that educators in every community, not just the most well-resourced, can shape and benefit from AI's potential.</p><p>For more on Anthropic's education initiatives, <a href="https://claude.com/solutions/education">see here</a>.</p><p><br/></p></article> https://www.anthropic.com/news/anthropic-teach-for-all News Wed, 21 Jan 2026 00:00:00 +0000 Mariano-Florentino Cuéllar appointed to Anthropic’s Long-Term Benefit Trust https://www.anthropic.com/news/mariano-florentino-long-term-benefit-trust Anthropic’s Long-Term Benefit Trust announced the appointment of Mariano-Florentino (Tino) Cuéllar as a new member of the Trust. The Long-Term Benefit Trust is an independent body designed to help Anthropic achieve its public benefit mission. <article>Announcements<h1>Mariano-Florentino Cuéllar appointed to Anthropic’s Long-Term Benefit Trust</h1>Jan 21, 2026<img alt="Mariano-Florentino Cuéllar appointed to Anthropic’s Long-Term Benefit Trust" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/ffc0d7957a232518519f13c0d64896921ea215e2-1000x1000.svg"/><p>Anthropic’s Long-Term Benefit Trust announced the appointment of Mariano-Florentino (Tino) Cuéllar as a new member of the Trust. The Long-Term Benefit Trust is an independent body designed to help Anthropic achieve its public benefit mission.</p><p></p><p>Cuéllar brings extensive experience in law, governance, and international affairs, including service as a Justice of the Supreme Court of California, leadership of Stanford's Freeman Spogli Institute for International Studies, and his current role as President of the Carnegie Endowment for International Peace. Cuéllar <a href="https://carnegieendowment.org/posts/2026/01/tino-cuellar-to-step-down-as-carnegie-endowment-president-in-july-2026?lang=en">announced plans </a>to step down from Carnegie in July 2026, when he will return to Stanford University to lead the Center for Advanced Study in the Behavioral Sciences and the Knight-Hennessy Scholars Program. </p><p></p><p>He has served three U.S. presidential administrations and currently chairs the board of the William &amp; Flora Hewlett Foundation. Cuéllar offers a global perspective shaped by his upbringing along the U.S.-Mexico border and a career spanning immigration, criminal justice, public health, and regulatory reform. His work has consistently focused on how technology affects public institutions and democratic governance—including co-leading California's Working Group on AI Frontier Models alongside Fei-Fei Li, and serving on the National Academy of Sciences Committee on Social and Ethical Implications of Computing Research.</p><p></p><p>Neil Buddy Shah, Chair of the Long-Term Benefit Trust, said: "As AI becomes a defining factor in geopolitical competition—reshaping economies, security, and the balance of power between nations—the Trust needs leaders who understand these dynamics. Tino's exceptional background in law, governance, and international affairs will be invaluable as we help Anthropic navigate a world where AI adoption by governments and institutions is accelerating rapidly."</p><p></p><p>Anthropic is a Public Benefit Corporation with a mission of ensuring a safe transition through transformative AI. The Long-Term Benefit Trust helps Anthropic achieve this public benefit mission by selecting members of Anthropic’s Board of Directors, and advising the Board and leadership on how the company can maximize the benefits of advanced AI and mitigate its risks. New Trustees are selected by existing Trustees, in consultation with Anthropic, and have no financial stake in Anthropic. The Trust’s composition reflects a recognition that transformative AI will affect more than technology or business, with significant implications for global health, international security and society as a whole.</p><p></p><p>Tino Cuéllar said: “As AI capabilities advance at an unprecedented pace, the need for governance structures that marry private sector dynamism with civic responsibility has never been more urgent. Anthropic’s leadership has demonstrated a genuine commitment to thinking deeply about the societal implications of their work—not just the technology, but its impact on global security, democratic institutions, and human welfare. The Long-Term Benefit Trust represents a thoughtful approach to ensuring that as these powerful systems evolve, decisions about their development remain grounded in the broader public interest. I’m honored to contribute my experience to this important work.”</p><p></p><p>The Trust also announced that Kanika Bahl and Zachary Robinson have concluded their terms as Trustees. Both joined the Trust at its founding and contributed significantly to establishing its role in Anthropic’s governance.</p><p></p><p>Daniela Amodei, President of Anthropic, said: "I'm delighted to welcome Tino to the Trust. What I find most compelling about him is his ability to work across sectors—law, government, academia, and technology. As AI systems become more capable, we need leaders who have spent their careers thinking deeply about technology's role in society. We're also extremely grateful to Kanika Bahl and Zach Robinson for their contributions during the Trust's formative period—it would not be where it is today without them."</p><p></p><p>Buddy Shah, Chair of the Long-Term Benefit Trust, said: “I've been grateful for Kanika and Zach's partnership since the Trust was established in 2023. They helped build the LTBT from the ground up—including the work of appointing board members like Jay Kreps and Reed Hastings who have strengthened Anthropic's governance. I'm grateful for their service and proud of what we built together."</p><p></p><p>Kanika Bahl, CEO &amp; President of Evidence Action, said: “Serving on the Long-Term Benefit Trust has been a privilege during a pivotal time for both Anthropic and the broader AI field. I’ve been impressed by the seriousness with which Anthropic’s leadership approaches questions of safety and societal benefit. I wish the Trust and the entire Anthropic team continued success.”</p><p></p><p>Zachary Robinson, CEO of the Centre for Effective Altruism, said: “I've been honored by the opportunity to serve on the Long-Term Benefit Trust during these formative years. I continue to admire the commitment Anthropic's leadership and Trustees make to safety and public benefit, and I have been impressed by the degree to which that commitment has endured as Anthropic has scaled and evolved. I am grateful to know Anthropic remains in the hands of leaders who care deeply about its mission, and I wish them all success.”</p></article> https://www.anthropic.com/news/mariano-florentino-long-term-benefit-trust News Wed, 21 Jan 2026 00:00:00 +0000 Claude's new constitution https://www.anthropic.com/news/claude-new-constitution We’re publishing a new constitution for our AI model, Claude. It’s a detailed description of Anthropic’s vision for Claude’s values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be. <article>Announcements<h1>Claude's new constitution</h1>Jan 22, 2026<a href="http://anthropic.com/constitution">Read the constitution</a><img alt="Claude's new constitution" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/e69f9d8245799a0c2688d72e997f708475233d6b-1000x1000.svg"/><p>We’re publishing a new constitution for our AI model, Claude. It’s a detailed description of Anthropic’s vision for Claude’s values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be.</p><p>The constitution is a crucial part of our model training process, and its content directly shapes Claude’s behavior. Training models is a difficult task, and Claude’s outputs might not always adhere to the constitution’s ideals. But we think that the way the new constitution is written—with a thorough explanation of our intentions and the reasons behind them—makes it more likely to cultivate good values during training.</p><p>In this post, we describe what we’ve included in the new constitution and some of the considerations that informed our approach.</p><p><em>We’re releasing Claude’s constitution in full under a <a href="https://creativecommons.org/publicdomain/zero/1.0/">Creative Commons CC0 1.0 Deed</a>, meaning it can be freely used by anyone for any purpose without asking for permission.</em></p><h2>What is Claude’s Constitution?</h2><p>Claude’s constitution is the foundational document that both expresses and shapes who Claude is. It contains detailed explanations of the values we would like Claude to embody and the reasons why. In it, we explain what we think it means for Claude to be helpful while remaining broadly safe, ethical, and compliant with our guidelines. The constitution gives Claude information about its situation and offers advice for how to deal with difficult situations and tradeoffs, like balancing honesty with compassion and the protection of sensitive information. Although it might sound surprising, the constitution is written <em>primarily for Claude</em>. It is intended to give Claude the knowledge and understanding it needs to act well in the world.</p><p>We treat the constitution as the final authority on how we want Claude to be and to behave—that is, any other training or instruction given to Claude should be consistent with both its letter and its underlying spirit. This makes publishing the constitution particularly important from a transparency perspective: it lets people understand which of Claude’s behaviors are intended versus unintended, to make informed choices, and to provide useful feedback. We think transparency of this kind will become ever more important as AIs start to exert more influence in society1.</p><p>We use the constitution at various stages of the training process. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using <a href="https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback">Constitutional AI</a>. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training. </p><p>Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals <em>and</em> a useful artifact for training.</p><h2>Our new approach to Claude’s Constitution</h2><p>Our previous <a href="https://www.anthropic.com/news/claudes-constitution">Constitution</a> was composed of a list of standalone principles. We’ve come to believe that a different approach is necessary. We think that in order to be good actors in the world, AI models like Claude need to understand <em>why</em> we want them to behave in certain ways, and we need to explain this to them rather than merely specify <em>what</em> we want them to do. If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize—to apply broad principles rather than mechanically following specific rules.</p><p>Specific rules and bright lines sometimes have their advantages. They can make models’ actions more predictable, transparent, and testable, and we do use them for some especially high-stakes behaviors in which Claude should never engage (we call these “hard constraints”). But such rules can also be applied poorly in unanticipated situations or when followed too rigidly2. We don’t intend for the constitution to be a rigid legal document—and legal constitutions aren’t necessarily like this anyway.</p><p>The constitution reflects our current thinking about how to approach a dauntingly novel and high-stakes project: creating safe, beneficial non-human entities whose capabilities may come to rival or exceed our own. Although the document is no doubt flawed in many ways, we want it to be something future models can look back on and see as an honest and sincere attempt to help Claude understand its situation, our motives, and the reasons we shape Claude in the ways we do.</p><h2>A brief summary of the new constitution</h2><p>In order to be both safe and beneficial, we want all current Claude models to be:</p><ol><li><strong>Broadly safe</strong>: not undermining appropriate human mechanisms to oversee AI during the current phase of development;</li><li><strong>Broadly ethical</strong>: being honest, acting according to good values, and avoiding actions that are inappropriate, dangerous, or harmful;</li><li><strong>Compliant with Anthropic’s guidelines</strong>: acting in accordance with more specific guidelines from Anthropic where relevant;</li><li><strong>Genuinely helpful</strong>: benefiting the operators and users they interact with.</li></ol><p>In cases of apparent conflict, Claude should generally prioritize these properties in the order in which they’re listed.</p><p>Most of the constitution is focused on giving more detailed explanations and guidance about these priorities. The main sections are as follows:</p><ul><li><strong>Helpfulness</strong>. In this section, we emphasize the immense value that Claude being genuinely and substantively helpful can provide for users and for the world. Claude can be like a brilliant friend who also has the knowledge of a doctor, lawyer, and financial advisor, who will speak frankly and from a place of genuine care and treat users like intelligent adults capable of deciding what is good for them. We also discuss how Claude should navigate helpfulness across its different “principals”—Anthropic itself, the operators who build on our API, and the end users. We offer heuristics for weighing helpfulness against other values.</li><li><strong>Anthropic’s guidelines</strong>. This section discusses how Anthropic might give supplementary instructions to Claude about how to handle specific issues, such as medical advice, cybersecurity requests, jailbreaking strategies, and tool integrations. These guidelines often reflect detailed knowledge or context that Claude doesn’t have by default, and we want Claude to prioritize complying with them over more general forms of helpfulness. But we want Claude to recognize that Anthropic’s deeper intention is for Claude to behave safely and ethically, and that these guidelines should never conflict with the constitution as a whole.</li><li><strong>Claude’s ethics</strong>. Our central aim is for Claude to be a good, wise, and virtuous agent, exhibiting skill, judgment, nuance, and sensitivity in handling real-world decision-making, including in the context of moral uncertainty and disagreement. In this section, we discuss the high standards of honesty we want Claude to hold, and the nuanced reasoning we want Claude to use in weighing the values at stake when avoiding harm. We also discuss our current list of hard constraints on Claude’s behavior—for example, that Claude should never provide significant uplift to a bioweapons attack.</li><li><strong>Being broadly safe. </strong>Claude should not undermine humans’ ability to oversee and correct its values and behavior during this critical period of AI development. In this section, we discuss how we want Claude to prioritize this sort of safety even above ethics—not because we think safety is ultimately more important than ethics, but because current models can make mistakes or behave in harmful ways due to mistaken beliefs, flaws in their values, or limited understanding of context. It’s crucial that we continue to be able to oversee model behavior and, if necessary, prevent Claude models from taking action.</li><li><strong>Claude’s nature</strong>. In this section, we express our uncertainty about whether Claude might have some kind of consciousness or moral status (either now or in the future). We discuss how we hope Claude will approach questions about its nature, identity, and place in the world. Sophisticated AIs are a genuinely new kind of entity, and the questions they raise bring us to the edge of existing scientific and philosophical understanding. Amidst such uncertainty, we care about Claude’s psychological security, sense of self, and wellbeing, both for Claude’s own sake and because these qualities may bear on Claude’s integrity, judgment, and safety. We hope that humans and AIs can explore this together.</li></ul><p>We’re releasing the full text of the constitution today, and we aim to release additional materials in the future that will be helpful for training, evaluation, and transparency.</p><h2>Conclusion</h2><p>Claude’s constitution is a living document and a continuous work in progress. This is new territory, and we expect to make mistakes (and hopefully correct them) along the way. Nevertheless, we hope it offers meaningful transparency into the values and priorities we believe should guide Claude’s behavior. To that end, we will maintain an up-to-date version of Claude’s constitution on our website.</p><p>While writing the constitution, we sought feedback from various external experts (as well as asking for input from prior iterations of Claude). We’ll likely continue to do so for future versions of the document, from experts in law, philosophy, theology, psychology, and a wide range of other disciplines. Over time, we hope that an external community can arise to critique documents like this, encouraging us and others to be increasingly thoughtful.</p><p>This constitution is written for our mainline, general-access Claude models. We have some models built for specialized uses that don’t fully fit this constitution; as we continue to develop products for specialized use cases, we will continue to evaluate how to best ensure our models meet the core objectives outlined in this constitution.</p><p>Although the constitution expresses our vision for Claude, training models towards that vision is an ongoing technical challenge. We will continue to be open about any ways in which model behavior comes apart from our vision, such as in <a href="https://assets.anthropic.com/m/64823ba7485345a7/Claude-Opus-4-5-System-Card.pdf">our system cards</a>. Readers of the constitution should keep this gap between intention and reality in mind.</p><p>Even if we succeed with our current training methods at creating models that fit our vision, we might fail later as models become more capable. For this and other reasons, alongside the constitution, we <a href="https://www.anthropic.com/research">continue to pursue</a> a broad portfolio of methods and tools to help us assess and improve the alignment of our models: new and more rigorous evaluations, safeguards to prevent misuse, detailed investigations of actual and potential alignment failures, and interpretability tools that help us understand at a deeper level how the models work.</p><p>At some point in the future, and perhaps soon, documents like Claude’s constitution might matter a lot—much more than they do now. Powerful AI models will be a new kind of force in the world, and those who are creating them have a chance to help them embody the best in humanity. We hope this new constitution is a step in that direction.</p><p>Read <a href="http://anthropic.com/constitution"><strong>the full constitution</strong></a>.</p><h4>Footnotes</h4><ol><li>We have previously published an <a href="https://www.anthropic.com/news/claudes-constitution">earlier version</a> of our constitution, and OpenAI has published their <a href="https://model-spec.openai.com/2025-10-27.html">model spec</a> which has a similar function.</li><li>Training on rigid rules might negatively affect a model’s character more generally. For example, imagine we trained Claude to follow a rule like “Always recommend professional help when discussing emotional topics.” This might be well-intentioned, but it could have unintended consequences: Claude might start modeling itself as an entity that cares more about bureaucratic box-ticking—always ensuring that a specific recommendation is made—rather than actually helping people.</li></ol></article> https://www.anthropic.com/news/claude-new-constitution News Thu, 22 Jan 2026 00:00:00 +0000 Anthropic partners with the UK Government to bring AI assistance to GOV.UK services https://www.anthropic.com/news/gov-UK-partnership Anthropic has been selected by the UK's Department for Science, Innovation and Technology (DSIT) to help build and pilot a dedicated AI-powered assistant for GOV.UK. The AI assistant will help people navigate government services and give tailored advice. The initial use case is employment: helping people find work, access training, understand the support and resources available, and more. <article>Announcements<h1>Anthropic partners with the UK Government to bring AI assistance to GOV.UK services</h1>Jan 27, 2026<img alt="Anthropic partners with the UK Government to bring AI assistance to GOV.UK services" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6e00dbffcddc82df5e471c43453abfc74ca94e8d-1000x1000.svg"/><p>Anthropic has been selected by the UK's Department for Science, Innovation and Technology (DSIT) to help build and pilot a dedicated AI-powered assistant for GOV.UK. The AI assistant will help people navigate government services and give tailored advice. The initial use case is employment: helping people find work, access training, understand the support and resources available, and more.</p><p>This builds on the <a href="https://www.anthropic.com/news/mou-uk-government">Memorandum of Understanding</a> Anthropic signed in February 2025 with the UK government to explore how advanced AI could transform public services for UK citizens.</p><h3><strong>AI for UK government services that puts safety first</strong></h3><p>Since signing the MOU, Anthropic and DSIT have been collaborating on how to bring AI into government services safely and effectively. The GOV.UK AI assistant, powered by Claude, is one of the first major outcomes of that work. It's an agentic system designed to go beyond answering questions—actively guiding people through government processes with individually-tailored support.</p><p>Anthropic's mission is to ensure that the world safely makes the transition through transformative AI. “We’re excited to partner with the UK government to help deliver on the AI Opportunities Action Plan,” commented Pip White, Head of UK, Ireland and Northern Europe. “This partnership with the UK government is central to our mission. It demonstrates how frontier AI can be deployed safely for the public benefit, setting the standard for how governments integrate AI into the services their citizens depend on.”</p><p>A central goal of this partnership is building AI and AI safety expertise within the UK government. Anthropic engineers will work alongside civil servants and software developers at the Government Digital Service throughout the engagement, with the goal of ensuring the UK government can independently maintain the system.</p><h3><strong>Helping more people find work in the UK</strong></h3><p>The AI assistant for GOV.UK will initially focus on supporting job seekers entering or re-entering the workforce. It will provide personalized career advice, help people access training, explain supports, and intelligently route people to the right services based on individual circumstances. It will also maintain context across interactions, so people don't have to start from scratch each time they return. Users will have full control over their data—including what's remembered, and the ability to opt out at any time—with all personal information handled in line with UK data protection law.</p><p>The project follows DSIT's “Scan, Pilot, Scale” framework, a deliberate, phased approach that allows government and Anthropic to test, learn, and iterate before wider rollout.</p><h3><strong>Our commitment to the UK’s leadership in AI</strong></h3><p>This partnership reflects Anthropic's broader and growing investment in the United Kingdom. We are committed to supporting the UK’s role as a global leader in AI, and our presence in the country continues to deepen across multiple fronts.</p><p>We continue to work closely with the UK AI Safety Institute to test and evaluate our models, ensuring that the safeguards and evaluation frameworks we develop together inform how Claude is deployed in the public sector and beyond. This collaboration is part of our long-standing commitment to building AI systems that are not only capable but safe and trustworthy.</p><p>Our London office is home to a growing team of AI researchers, and continues to expand with functions including go-to-market, applied AI, policy and more. “Anthropic’s UK team plays a key role in advancing our models at the frontier and transforming the public sector and broader British business landscape, from fast-growing startups like incident.io and Wordsmith to enterprises like WPP and London Stock Exchange Group,”commented Chris Ciauri, Managing Director International.</p><h3><strong>Bringing AI to public services around the world</strong></h3><p>This initiative with DSIT is part of a growing trend of governments and organizations partnering with Anthropic to deploy AI for public benefit. In the UK, we partner with the London School of Economics to provide students access to Claude. In Iceland, we've <a href="https://www.anthropic.com/news/anthropic-and-iceland-announce-one-of-the-world-s-first-national-ai-education-pilots">partnered</a> with the Ministry of Education and Children to launch one of the world's first national AI education pilots, giving teachers across the country access to Claude to support lesson preparation and student learning. Anthropic has also recently partnered with the <a href="https://www.anthropic.com/news/rwandan-government-partnership-ai-education">Rwandan Government</a> to bring AI education to hundreds of thousands of learners across the country.</p><p><br/></p></article> https://www.anthropic.com/news/gov-UK-partnership News Tue, 27 Jan 2026 00:00:00 +0000 ServiceNow chooses Claude to power customer apps and increase internal productivity https://www.anthropic.com/news/servicenow-anthropic-claude As enterprises move beyond experimenting with AI and start putting it into production across their core business operations, scale and security matters just as much as capabilities. <article>Announcements<h1>ServiceNow chooses Claude to power customer apps and increase internal productivity</h1>Jan 28, 2026<img alt="ServiceNow chooses Claude to power customer apps and increase internal productivity" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/f8f4644253bde2f901550431b871b6dcf91e5d9d-1000x1000.svg"/><p>As enterprises move beyond experimenting with AI and start putting it into production across their core business operations, scale and security matters just as much as capabilities.</p><p>With this in mind, ServiceNow, which helps large companies manage and automate everything from IT support to HR to customer service on a single platform, has chosen Claude as its default model for its ServiceNow Build Agent, and as a preferred model across the ServiceNow AI Platform. Build Agent helps developers of all skill levels build and use Claude as an “agent” that can reason, decide on actions, and execute tasks autonomously.</p><p>In addition, ServiceNow is rolling out Claude and Claude Code across its global workforce of more than 29,000 employees to streamline sales preparation—cutting seller preparation time by 95% and boosting engineering productivity with Claude Code to reduce the time between idea and implementation across the organization.</p><p>Enterprises use ServiceNow’s platform to process more than 80 billion workflows every year—from resolving security incidents to onboarding new employees to managing customer support queues. Now, enterprises can power those workflows with Claude’s reasoning and coding capabilities, with the access controls, usage monitoring, and compliance that enterprise-scale use requires.</p><p>“A common error enterprises make with AI is to treat it as a kind of ‘bolt on’ tool that you access now and then. But the way to get much better results is to make AI an integral part of how you get work done—woven into the whole range of things workers do every day. That is where you actually start to see what these systems can do, and it's what we're doing in our partnership with ServiceNow,“ said Dario Amodei, CEO and co-founder of Anthropic.</p><p>“ServiceNow with Anthropic is turning intelligence into action through AI-native workflows for the world’s largest enterprises,“ said Bill McDermott, chairman and CEO of ServiceNow. “This partnership is about reimagining how work gets done. It puts the power to build, deploy, and scale mission-critical applications into the hands of every person, in every industry, at every level. Together, we are proving that deeply integrated platforms with an open ecosystem are how the future is built.”</p><h2><strong>Claude for ServiceNow customers</strong></h2><p><strong>Powering enterprise app development</strong>: Claude is the default model powering <a href="https://www.servicenow.com/products/vibe-coding.html">ServiceNow Build Agent</a>, an enterprise-grade coding solution for building apps and automations with AI. Build Agent has gained significant early traction: ServiceNow expects its usage to quadruple over the next 12 months. By integrating Build Agent with Claude, developers of all skill levels, including professional coders and citizen developers, can use natural language prompts to create applications that previously required significant developer support, or design, test, and use agentic automations. Customers use Claude in ServiceNow to handle tasks that need more advanced AI, while still keeping full visibility and control over what the AI does.</p><p><strong>Accelerating product adoption and time to value</strong>: ServiceNow is also working with us to improve how customers deploy and adopt ServiceNow products. With Claude, ServiceNow is targeting a 50% reduction in time-to-implement for customers—reducing the delay from initial sales conversations to autonomous deployment. Customers and partners will be able to use the same AI-powered approach to speed their own deployments.</p><p><strong>Applying innovative Claude-powered solutions to industries</strong>: ServiceNow is using Claude to build agentic applications for select industries, such as healthcare and life sciences. In these environments, Claude will support tasks like research analysis, claims authorization, and more—all while operating within ServiceNow’s governed platform. Claude is an industry-leading AI model for these tasks*, with Claude Opus 4.5 leading major medical benchmarks and life sciences evaluations. With the ServiceNow AI Platform underpinning these capabilities, claims authorization could be reduced from days to hours while also decreasing costs. ServiceNow and Anthropic will take these innovative industry solutions to market together.</p><h2><strong>How ServiceNow will use Claude internally</strong></h2><p>ServiceNow is putting Claude to work for the companys global workforce, applying the same AI capabilities internally that it brings to customers:</p><p><strong>Transforming sales productivity</strong>: ServiceNow sellers use a Claude-powered coaching tool to prepare for customer meetings. The tool connects Claude to real-time enterprise data and web search, allowing sellers to synthesize prospect information, account context, and other relevant materials in one place. Early results in testing show an up to 95% reduction in preparation time, helping sellers focus on strategic conversations rather than manual research.</p><p><strong>Boosting productivity with Claude Code</strong>: ServiceNow has also rolled out Claude Code, our AI coding assistant, to engineers, developers, and technical teams across the company. Teams use Claude Code to write and review code, debug issues, automate repetitive development tasks, and speed up internal tooling—reducing the time between idea and implementation across the organization.</p><h3><strong>Availability</strong></h3><p>Claude is now available as the default model for Build Agent and as a preferred model across the ServiceNow AI Platform. Tens of thousands of ServiceNow enterprise customers and the company’s global workforce can access Claude to build and deploy agentic automation and workflows across departments. Learn more about ServiceNow’s AI Platform at <a href="https://servicenow.com/ai">servicenow.com/ai</a>.</p><p><em></em></p><p><em>*For more on these evaluations, see <a href="https://www.anthropic.com/news/healthcare-life-sciences">here</a>.</em></p></article> https://www.anthropic.com/news/servicenow-anthropic-claude News Wed, 28 Jan 2026 00:00:00 +0000 Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery https://www.anthropic.com/news/anthropic-partners-with-allen-institute-and-howard-hughes-medical-institute Modern biological research generates data at unprecedented scale—from single-cell sequencing to whole-brain connectomics—yet transforming that data into validated biological insights remains a fundamental bottleneck. Knowledge synthesis, hypothesis generation, and experimental interpretation still depend on manual processes that can't keep pace with the data being produced. <article>Societal Impacts<h1>Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery</h1>Feb 2, 2026<img alt="Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/d6058e0db8e477dc782dacae46e2ec6663d165d9-1000x1000.svg"/><p>Modern biological research generates data at unprecedented scale—from single-cell sequencing to whole-brain connectomics—yet transforming that data into validated biological insights remains a fundamental bottleneck. Knowledge synthesis, hypothesis generation, and experimental interpretation still depend on manual processes that can't keep pace with the data being produced.</p><p>Today, Anthropic is announcing two flagship partnerships designed to close that gap. <a href="https://alleninstitute.org/"><strong>The Allen Institute</strong></a> and <strong><a href="https://www.hhmi.org/">Howard Hughes Medical Institute</a> </strong>(HHMI) will serve as founding partners in life sciences, extending Claude’s capabilities to frontier scientific research and enabling teams of scientists to work more effectively together and take on ambitious scientific challenges. Each collaboration brings together Anthropic's expertise in foundation models, agentic systems, and interpretability with world-class research institutions tackling distinct but complementary problems in biology and biomedical science. These partnerships position Claude at the center of scientific experimentation and will build a foundation in which scientists actively use Claude to plan and execute experiments.</p><p>Both partnerships are committed to transparency and advances that will help the broader scientific community rigorously deploy AI tools across many scientific domains. Scientific AI systems must not only produce accurate predictions but also provide reasoning that researchers can evaluate, trace, and build upon. These collaborations position Claude as a tool that augments, rather than replaces, human scientific judgment — ensuring that AI-generated insights are grounded in evidence and legible to the scientists who use them.</p><h2><strong>Howard Hughes Medical Institute: Building the infrastructure for AI-enabled scientific discovery</strong></h2><p>HHMI will partner with Anthropic to accelerate discovery in the biological sciences as one part of the Institute’s <a href="https://ai.hhmi.org/">AI@HHMI</a> initiative. The collaboration is anchored at HHMI’s Janelia Research Campus, which has been developing transformative technologies—from genetically encoded calcium sensors to electron microscopes engineered for understanding the architecture of the brain—for two decades. This foundation uniquely positions HHMI to help shape how AI systems participate in and enhance the research process.</p><p>The partnership with Anthropic will involve close collaboration on both the deployment and ongoing development of AI models, ensuring that AI tools evolve in direct response to real experimental needs. Since announcing AI@HHMI in 2024, HHMI has launched several projects that seek to use AI tools to solve longstanding scientific problems ranging from computational protein design to neural mechanisms of cognition. The collaboration with Anthropic will focus on developing specialized AI agents for use within labs. These will serve as a comprehensive source of experimental knowledge integrated with cutting-edge scientific instruments and analysis pipelines to speed the pace of discovery.</p><h2><strong>Allen Institute: Multi-agent systems for mechanistic discovery</strong></h2><p>The Allen Institute will collaborate with Anthropic to develop multi-agent AI systems for multi-modal data analysis and exploration across the institute's areas of scientific focus. The work will explore how multiple specialized AI agents—for multi-omic data integration, knowledge graph management, temporal dynamics modeling, and experimental design—can be coordinated to support the full arc of scientific investigation.</p><p>This collaboration will explore how agentic AI systems can compress months of manual analysis into hours while surfacing patterns that human researchers might otherwise miss. These systems are designed to amplify scientific intuition rather than replace it, keeping researchers in control of scientific direction while handling computational complexity.</p><p>For Anthropic, this collaboration provides in-depth feedback from real scientific use with day-to-day workflows where reliability and judgment matter. Working with the Allen Institute helps surface usability gaps and failure modes that don't appear in more controlled settings.</p><h2><strong>Looking ahead</strong></h2><p>These partnerships will inform the broader development of <a href="https://claude.com/solutions/life-sciences">Claude’s life science capabilities</a>, generating insights about how AI systems can most effectively support scientific workflows across diverse research contexts. Anthropic is committed to responsible development that prioritizes scientific rigor, interpretability, and researcher autonomy.</p></article> https://www.anthropic.com/news/anthropic-partners-with-allen-institute-and-howard-hughes-medical-institute News Mon, 02 Feb 2026 00:00:00 +0000 Apple’s Xcode now supports the Claude Agent SDK https://www.anthropic.com/news/apple-xcode-claude-agent-sdk Apple's Xcode is where developers build, test, and distribute apps for Apple platforms, including iPhone, iPad, Mac, Apple Watch, Apple Vision Pro, and Apple TV. <article>Product<h1>Apple’s Xcode now supports the Claude Agent SDK</h1>Feb 3, 2026<img alt="Apple’s Xcode now supports the Claude Agent SDK" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/f8f4644253bde2f901550431b871b6dcf91e5d9d-1000x1000.svg"/><p>Apple's Xcode is where developers build, test, and distribute apps for Apple platforms, including iPhone, iPad, Mac, Apple Watch, Apple Vision Pro, and Apple TV.</p><p>In September, <a href="https://www.anthropic.com/news/claude-in-xcode">we announced</a> that developers would have access to Claude Sonnet 4 in Xcode 26. Claude could be used to write code, debug, and generate documentation—but it was limited to helping with individual, turn-by-turn requests.</p><p>Now, Xcode 26.3 introduces a native integration with the <a href="https://platform.claude.com/docs/en/agent-sdk/overview">Claude Agent SDK</a>, the same underlying harness that powers Claude Code. Developers get the full power of Claude Code directly in Xcode—including subagents, background tasks, and plugins—all without leaving the IDE. </p>Claude Agent in Xcode<h2><strong>Using Claude for long-running, autonomous work in Xcode</strong></h2><p>With the Claude Agent SDK, Claude can now work autonomously on much more sophisticated, long-running coding tasks inside Xcode. Specifically, this integration supports:</p><ul><li><strong>Visual verification with Previews.</strong> With the new integration, Claude can capture Xcode Previews to see what the interface it’s building looks like in practice, identify any issues with what it sees, and iterate from there. This is particularly useful when building SwiftUI views, where the visual output is the thing that matters most. Claude can close the loop on its own implementation, allowing it to build higher-quality interfaces that are much closer to developers’ design intent on the first try.</li><li><strong>Reasoning across projects.</strong> Building for Apple platforms means working with a wide range of frameworks and technologies, like SwiftUI, UIKit, Swift Data, and many more. Claude can explore a project's full file structure, understand how these pieces connect, and identify where changes need to be made before it starts writing code. When given a task, it works with an understanding of the whole app and its architecture – not just whichever file is currently open.</li><li><strong>Autonomous task execution.</strong> Claude can be given a <em>goal</em>, rather than a set of specific instructions. It’ll then break the task down itself, decide which files to modify, make the changes, and iterate if something doesn't work. When Claude needs to understand how an Apple API works, or how a specific framework is meant to be used, it can search Apple's documentation directly. And it can update the project as needed and continue until the task is complete or it needs a user’s input—a meaningful time saver for developers who are often working alone or on small teams.</li><li><strong>Interface through the Model Context Protocol.</strong> In addition to accessing Claude Agent directly within the IDE, Xcode 26.3 also makes its capabilities available through the Model Context Protocol. Developers using Claude Code can integrate with Xcode over MCP and capture visual Previews without leaving the CLI.</li></ul><h2><strong>Availability</strong></h2><p>Xcode 26.3 is available as a release candidate for all members of the Apple Developer Program starting today, with a release coming soon on Apple’s App Store. See Apple’s announcement <a href="https://www.apple.com/newsroom/2026/02/xcode-26-point-3-unlocks-the-power-of-agentic-coding/">here for more</a>.</p></article> https://www.anthropic.com/news/apple-xcode-claude-agent-sdk News Tue, 03 Feb 2026 00:00:00 +0000 Claude is a space to think https://www.anthropic.com/news/claude-is-a-space-to-think There are many good places for advertising. A conversation with Claude is not one of them. <article>Announcements<h1>Claude is a space to think</h1>Feb 4, 2026<img alt="Claude is a space to think" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/cd9cf56a7f049285b7c1c8786c0a600cf3d7f317-1000x1000.svg"/><p>There are many good places for advertising. A conversation with Claude is not one of them.</p><p>Advertising drives competition, helps people discover new products, and allows services like email and social media to be offered for free. We’ve run our own <a href="https://www.youtube.com/watch?v=FDNkDBNR7AM">ad campaigns</a>, and our AI models have, in turn, helped many of our customers in the advertising industry.</p><p>But including ads in conversations with Claude would be incompatible with what we want Claude to be: a genuinely helpful assistant for work and for deep thinking.</p><p>We want Claude to act unambiguously in our users’ interests. So we’ve made a choice: Claude will remain ad-free. Our users won’t see “sponsored” links adjacent to their conversations with Claude; nor will Claude’s responses be influenced by advertisers or include third-party product placements our users did not ask for.</p><h2><strong>The nature of AI conversations</strong></h2><p>When people use search engines or social media, they’ve come to expect a mixture of organic and sponsored content. Filtering signal from noise is part of the interaction.</p><p>Conversations with AI assistants are meaningfully different. The format is open-ended; users often share context and reveal more than they would in a search query. This openness is part of what makes conversations with AI valuable, but it’s also what makes them susceptible to influence in ways that other digital products are not.</p><p>Our <a href="https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship">analysis of conversations</a> with Claude (conducted in a way that keeps all data <a href="https://www.anthropic.com/research/clio">private and anonymous</a>) shows that an appreciable portion involve topics that are sensitive or deeply personal—the kinds of conversations you might have with a trusted advisor. Many other uses involve complex software engineering tasks, deep work, or thinking through difficult problems. The appearance of ads in these contexts would feel incongruous—and, in many cases, inappropriate.</p><p>We still have much to learn about the impact of AI models on the people who use them. <a href="https://www.jmir.org/2025/1/e67114">Early</a> <a href="https://hai.stanford.edu/news/exploring-the-dangers-of-ai-in-mental-health-care">research</a> suggests both benefits—like people finding support they couldn’t access elsewhere—and risks, including the potential for models to reinforce harmful beliefs in vulnerable users. Introducing advertising incentives at this stage would add another level of complexity. <a href="https://www.anthropic.com/research/tracing-thoughts-language-model">Our understanding</a> of how models translate the goals we set them into specific behaviors is still developing; an ad-based system could therefore have unpredictable results.</p><h2><strong>Incentive structures</strong></h2><p>Being genuinely helpful is one of the core principles of <a href="https://www.anthropic.com/constitution">Claude’s Constitution</a>, the document that describes our vision for Claude’s character and guides how we train the model. An advertising-based business model would introduce incentives that could work against this principle.</p><p>Consider a concrete example. A user mentions they’re having trouble sleeping. An assistant without advertising incentives would explore the various potential causes—stress, environment, habits, and so on—based on what might be most insightful to the user. An ad-supported assistant has an additional consideration: whether the conversation presents an opportunity to make a transaction. These objectives may often align—but not always. And, unlike a list of search results, ads that influence a model’s responses may make it difficult to tell whether a given recommendation comes with a commercial motive or not. Users shouldn’t have to second-guess whether an AI is genuinely helping them or subtly steering the conversation towards something monetizable.</p><p>Even ads that don’t directly influence an AI model’s responses and instead appear separately within the chat window would compromise what we want Claude to be: a clear space to think and work. Such ads would also introduce an incentive to optimize for engagement—for the amount of time people spend using Claude and how often they return. These metrics aren’t necessarily aligned with being genuinely helpful. The most useful AI interaction might be a short one, or one that resolves the user’s request without prompting further conversation.</p><p>We recognize that not all advertising implementations are equivalent. More transparent or opt-in approaches—where users explicitly choose to see sponsored content—might avoid some of the concerns outlined above. But the history of ad-supported products suggests that advertising incentives, once introduced, tend to expand over time as they become integrated into revenue targets and product development, blurring boundaries that were once more clear-cut. We’ve chosen not to introduce these dynamics into Claude.</p><h2><strong>Our approach</strong></h2><p>Anthropic is focused on businesses, developers, and helping our users flourish. Our business model is straightforward: we generate revenue through enterprise contracts and paid subscriptions, and we reinvest that revenue into improving Claude for our users. This is a choice with tradeoffs, and we respect that other AI companies might reasonably reach different conclusions.</p><p>Expanding access to Claude is central to our public benefit mission, and we want to do it without selling our users’ attention or data to advertisers. To that end, we’ve <a href="https://www.anthropic.com/news/anthropic-teach-for-all">brought AI tools and training to educators</a> in over 60 countries, begun national AI education pilots with <a href="https://www.anthropic.com/news/anthropic-and-iceland-announce-one-of-the-world-s-first-national-ai-education-pilots">multiple</a> <a href="https://www.anthropic.com/news/rwandan-government-partnership-ai-education">governments</a>, and made Claude <a href="https://www.anthropic.com/news/claude-for-nonprofits">available to nonprofits</a> at a significant discount. We continue to invest in our smaller models so that our free offering remains at the frontier of intelligence, and we may consider lower-cost subscription tiers and regional pricing where there is clear demand for it. Should we need to revisit this approach, we’ll be transparent about our reasons for doing so.</p><h2><strong>Supporting commerce</strong></h2><p>AI will increasingly interact with commerce, and we look forward to supporting this in ways that help our users. We’re particularly interested in the potential of agentic commerce, where Claude acts on a user’s behalf to handle a purchase or booking end to end. And we’ll continue to build features that enable our users to find, compare, or buy products, connect with businesses, and more—when they choose to do so.</p><p>We’re also exploring more ways to make Claude a focused space to be at your most productive. Users can already <a href="https://claude.com/blog/interactive-tools-in-claude">connect third-party tools</a> they use for work—like Figma, Asana, and Canva—and interact with them directly within Claude. We expect to introduce many more useful integrations and expand this toolkit over time.</p><p>All third-party interactions will be grounded in the same overarching design principle: they should be initiated by the <em>user </em>(where the AI is working for them) rather than an <em>advertiser</em> (where the AI is working, at least in part, for someone else). Today, whether someone asks Claude to research running shoes, compare mortgage rates, or recommend a restaurant for a special occasion, Claude’s only incentive is to give a helpful answer. We’d like to preserve that.</p><h2><strong>A trusted tool for thought</strong></h2><p>We want our users to trust Claude to help them keep thinking—about their work, their challenges, and their ideas.</p><p>Our experience of using the internet has made it easy to assume that advertising on the products we use is inevitable. But open a notebook, pick up a well-crafted tool, or stand in front of a clean chalkboard, and there are no ads in sight.</p><p>We think Claude should work the same way.</p><p><br/></p></article> https://www.anthropic.com/news/claude-is-a-space-to-think News Thu, 05 Feb 2026 00:00:00 +0000 Introducing Claude Opus 4.6 https://www.anthropic.com/news/claude-opus-4-6 We’re upgrading our smartest model. <article>Announcements<h1>Introducing Claude Opus 4.6</h1>Feb 5, 2026<img alt="Video thumbnail" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F4zrzovbb%2Fwebsite%2F5ac72c2c6509b4b6c41ac8f742636fe123b0ba1a-1920x1080.png&amp;w=3840&amp;q=75"/><p>We’re upgrading our smartest model.</p><p></p><p>The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token context window in beta.</p><p>Opus 4.6 can also apply its improved abilities to a range of everyday work tasks: running financial analyses, doing research, and using and creating documents, spreadsheets, and presentations. Within <a href="https://claude.com/blog/cowork-research-preview">Cowork</a>, where Claude can multitask autonomously, Opus 4.6 can put all these skills to work on your behalf.</p><p>The model’s performance is state-of-the-art on several evaluations. For example, it achieves the highest score on the agentic coding evaluation <a href="https://www.tbench.ai/news/announcement-2-0">Terminal-Bench 2.0</a> and leads all other frontier models on <a href="https://agi.safe.ai/">Humanity’s Last Exam</a>, a complex multidisciplinary reasoning test. On <a href="https://artificialanalysis.ai/evaluations/gdpval-aa">GDPval-AA</a>—an evaluation of performance on economically valuable knowledge work tasks in finance, legal, and other domains1—Opus 4.6 outperforms the industry’s next-best model (OpenAI’s GPT-5.2) by around 144 Elo points,2 and its own predecessor (Claude Opus 4.5) by 190 points. Opus 4.6 also performs better than any other model on <a href="https://openai.com/index/browsecomp/">BrowseComp</a>, which measures a model’s ability to locate hard-to-find information online.</p><p>As we show in our extensive <a href="https://www.anthropic.com/claude-opus-4-6-system-card">system card</a>, Opus 4.6 also shows an overall safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations.</p>Knowledge workAgentic searchCodingReasoning<img alt="Bar charts comparing Claude Opus 4.6 to other models on GDPval-AA" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F6e29759b50e8b3a8363b38b1f573d854df968671-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 is state-of-the-art on real-world work tasks across several professional domains.<img alt="Bar chart comparing Opus 4.6 to other models on DeepSearchQA" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F018d6d882034d50727948b22e3ad3844a43ee09c-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 gets the highest score in the industry for deep, multi-step agentic search.<img alt="Bar charts comparing Opus 4.6 to other models on Terminal-Bench 2" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fb8cfd7ebd6c82febce5f428f519d68a5dcf5d16f-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 excels at real-world agentic coding and system tasks.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fb8d511155f209c57e4d6a92ab115ebfc7c8832ff-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 extends the frontier of expert-level reasoning.<p>In Claude Code, you can now assemble <a href="https://code.claude.com/docs/en/agent-teams"><em>agent teams</em></a> to work on tasks together. On the API, Claude can use <a href="https://platform.claude.com/docs/en/build-with-claude/compaction"><em>compaction</em></a> to summarize its own context and perform longer-running tasks without bumping up against limits. We’re also introducing <a href="https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking"><em>adaptive thinking</em></a>, where the model can pick up on contextual clues about how much to use its extended thinking, and new <a href="https://platform.claude.com/docs/en/build-with-claude/effort"><em>effort</em></a> controls to give developers more control over intelligence, speed, and cost. </p><p>We’ve made substantial upgrades to <a href="https://claude.com/claude-in-excel">Claude in Excel</a>, and we’re releasing <a href="https://claude.com/claude-in-powerpoint">Claude in PowerPoint</a> in a research preview. This makes Claude much more capable for everyday work.</p><img alt="Video thumbnail" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F4zrzovbb%2Fwebsite%2F810008fad362e0ba3c984c3de094f4527541bb89-3840x2160.png&amp;w=3840&amp;q=75"/><p>Claude Opus 4.6 is available today on <a href="https://claude.ai/redirect/website.v1.084c0f80-1ac2-4d95-924f-3dae55f935fb">claude.ai</a>, our API, and all major cloud platforms. If you’re a developer, use <code>claude-opus-4-6</code> via the <a href="https://platform.claude.com/docs/en/about-claude/models/overview">Claude API</a>. Pricing remains the same at $5/$25 per million tokens; for full details, see our <a href="https://claude.com/pricing#api">pricing page</a>.</p><p></p><p>We cover the model, our new product updates, our evaluations, and our extensive safety testing in depth below.</p><h2>First impressions</h2><p>We build Claude with Claude. Our engineers write code with Claude Code every day, and every new model first gets tested on our own work. With Opus 4.6, we’ve found that the model brings more focus to the most challenging parts of a task without being told to, moves quickly through the more straightforward parts, handles ambiguous problems with better judgment, and stays productive over longer sessions.</p><p></p><p>Opus 4.6 often thinks more deeply and more carefully revisits its reasoning before settling on an answer. This produces better results on harder problems, but can add cost and latency on simpler ones. If you’re finding that the model is overthinking on a given task, we recommend dialing effort down from its default setting (high) to medium. You can control this easily with the <code>/effort</code> <a href="https://platform.claude.com/docs/en/build-with-claude/effort">parameter</a>.</p><p></p><p>Here are some of the things our Early Access partners told us about Claude Opus 4.6, including its propensity to work autonomously without hand-holding, its success where previous models failed, and its effect on how teams work:<br/></p><img alt="Notion logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/7cfef6cd8ce2515a6abd52560ac4189f89f9ad35-116x40.svg"/><blockquote>Claude Opus 4.6 is the strongest model Anthropic has shipped. It takes complicated requests and actually follows through, breaking them into concrete steps, executing, and producing polished work even when the task is ambitious. For Notion users, it feels less like a tool and more like a capable collaborator.</blockquote><img alt="GitHub logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/7522fc92399dcb4a68f11c7e147e711fcadbe75b-126x36.svg"/><blockquote>Early testing shows Claude Opus 4.6 delivering on the complex, multi-step coding work developers face every day—especially agentic workflows that demand planning and tool calling. This starts unlocking long-horizon tasks at the frontier.</blockquote><img alt="Replit logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/ff1601aa704506064c9ddee37079f17f9b0799cd-150x48.svg"/><blockquote>Claude Opus 4.6 is a huge leap for agentic planning. It breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with real precision.</blockquote><img alt="Asana logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6d031c0893b24dd00e9f207c7635d6b91d809729-124x24.svg"/><blockquote>Claude Opus 4.6 is the best model we've tested yet. Its reasoning and planning capabilities have been exceptional at powering our AI Teammates. It's also a fantastic coding model – its ability to navigate a large codebase and identify the right changes to make is state of the art.</blockquote><img alt="Cognition logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/da50e4c43d4b95fe1a2105c344050c6ba2397f3f-150x48.svg"/><blockquote>Claude Opus 4.6 reasons through complex problems at a level we haven't seen before. It considers edge cases that other models miss and consistently lands on more elegant, well-considered solutions. We're particularly impressed with Opus 4.6 in Devin Review, where it's increased our bug catching rates.</blockquote><img alt="Windsurf logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/7415f908eca858ec4c3453c5d8151e46a0fb1e6d-150x48.svg"/><blockquote>Claude Opus 4.6 feels noticeably better than Opus 4.5 in Windsurf, especially on tasks that require careful exploration like debugging and understanding unfamiliar codebases. We’ve noticed Opus 4.6 thinks longer, which pays off when deeper reasoning is needed.</blockquote><img alt="Thomson Reuters logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/ff031ea5953adc10e50782ff6c8124ad6ce28ba6-213x31.svg"/><blockquote>Claude Opus 4.6 represents a meaningful leap in long-context performance. In our testing, we saw it handle much larger bodies of information with a level of consistency that strengthens how we design and deploy complex research workflows. Progress in this area gives us more powerful building blocks to deliver truly expert-grade systems professionals can trust.</blockquote><img alt="NBIM logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/5d27d5fd738921411bb1e39bc27c396c6c075b4b-157x38.svg"/><blockquote>Across 40 cybersecurity investigations, Claude Opus 4.6 produced the best results 38 of 40 times in a blind ranking against Claude 4.5 models. Each model ran end to end on the same agentic harness with up to 9 subagents and 100+ tool calls.</blockquote><img alt="Cursor logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/d74b2a5f8dc7d22b0febb8c69feabff0999da79d-151x36.svg"/><blockquote>Claude Opus 4.6 is the new frontier on long-running tasks from our internal benchmarks and testing. It's also been highly effective at reviewing code.</blockquote><img alt="Harvey logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/501ebc6538c68e98ae6cfab79a5747009700f4a1-100x30.svg"/><blockquote>Claude Opus 4.6 achieved the highest BigLaw Bench score of any Claude model at 90.2%. With 40% perfect scores and 84% above 0.8, it’s remarkably capable for legal reasoning.</blockquote><img alt="Rakuten logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/652c487024ae6e67508571e7e5f64b7d482bdadd-150x48.svg"/><blockquote>Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories. It handled both product and organizational decisions while synthesizing context across multiple domains, and it knew when to escalate to a human.</blockquote><img alt="Lovable logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/96f4d2262959c4c1ecdc9dc2d93b9087115d789f-140x26.svg"/><blockquote>Claude Opus 4.6 is an uplift in design quality. It works beautifully with our design systems and it’s more autonomous, which is core to Lovable’s values. People should be creating things that matter, not micromanaging AI.</blockquote><img alt="Box logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/49b99af78924f43f878d39a25d574da293c68596-60x32.svg"/><blockquote>Claude Opus 4.6 excels in high-reasoning tasks like multi-source analysis across legal, financial, and technical content. Box’s eval showed a 10% lift in performance, reaching 68% vs. a 58% baseline, and near-perfect scores in technical domains.</blockquote><img alt="Figma logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/eba077a5df68d0e74010602595c597520c850a0d-80x30.svg"/><blockquote>Claude Opus 4.6 generates complex, interactive apps and prototypes in Figma Make with an impressive creative range. The model translates detailed designs and multi-layered tasks into code on the first try, making it a powerful starting point for teams to explore and build ideas.</blockquote><img alt="Shopify logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/02dced142fb26d4a3441cad79f997a1fd6c9a8b0-150x48.svg"/><blockquote>Claude Opus 4.6 is the best Anthropic model we’ve tested. It understands intent with minimal prompting and went above and beyond, exploring and creating details I didn’t even know I wanted until I saw them. It felt like I was working with the model, not waiting on it.</blockquote><img alt="Bolt.new logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/ade72922c1b58726e1b7c17f0e500054e3d74aa0-92x37.svg"/><blockquote>Both hands-on testing and evals show Claude Opus 4.6 is a meaningful improvement for design systems and large codebases, use cases that drive enormous enterprise value. It also one-shotted a fully functional physics engine, handling a large multi-scope task in a single pass.</blockquote><img alt="Ramp logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/1919e4705bd67f47c2f5bfe4950d0d2969dfaf4d-118x32.svg"/><blockquote>Claude Opus 4.6 is the biggest leap I’ve seen in months. I’m more comfortable giving it a sequence of tasks across the stack and letting it run. It’s smart enough to use subagents for the individual pieces.</blockquote><img alt="SentinelOne logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6e6ecfcd7c8ed79ef1c46cc27c4ecc4ab1ca7490-219x42.svg"/><blockquote>Claude Opus 4.6 handled a multi-million-line codebase migration like a senior engineer. It planned up front, adapted its strategy as it learned, and finished in half the time.</blockquote><img alt="Vercel logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/23bd0e83f41047df414b1635b513d8f9e1c3c628-150x48.svg"/><blockquote>We only ship models in v0 when developers will genuinely feel the difference. Claude Opus 4.6 passed that bar with ease. Its frontier-level reasoning, especially with edge cases, helps v0 to deliver on our number-one aim: to let anyone elevate their ideas from prototype to production.</blockquote><img alt="Shortcut.ai logo" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/d7be9db28564ebd6a8e5241d3d4e34a031775e96-63x64.svg"/><blockquote>The performance jump with Claude Opus 4.6 feels almost unbelievable. Real-world tasks that were challenging for Opus [4.5] suddenly became easy. This feels like a watershed moment for spreadsheet agents on Shortcut.</blockquote>01 /<!-- --> <!-- -->20<h2>Evaluating Claude Opus 4.6</h2><p>Across agentic coding, computer use, tool use, search, and <a href="https://claude.com/blog/opus-4-6-finance">finance</a>, Opus 4.6 is an industry-leading model, often by a wide margin. The table below shows how Claude Opus 4.6 compares to our previous models and to other industry models on a variety of benchmarks.</p><img alt="Benchmark table comparing Opus 4.6 to other models" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F0e5c55fa8fd05a893d11168654dc36999e90908b-2600x2968.png&amp;w=3840&amp;q=75"/><p>Opus 4.6 is much better at retrieving relevant information from large sets of documents. This extends to long-context tasks, where it holds and tracks information over hundreds of thousands of tokens with less drift, and picks up buried details that even Opus 4.5 would miss.</p><p></p><p>A common complaint about AI models is “<a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">context rot</a>,” where performance degrades as conversations exceed a certain number of tokens. Opus 4.6 performs markedly better than its predecessors: on the 8-needle 1M variant of <a href="https://huggingface.co/datasets/openai/mrcr">MRCR v2</a>—a needle-in-a-haystack benchmark that tests a model’s ability to retrieve information “hidden” in vast amounts of text—Opus 4.6 scores 76%, whereas Sonnet 4.5 scores just 18.5%. This is a qualitative shift in how much context a model can actually use while maintaining peak performance.</p><p></p><p>All in all, Opus 4.6 is better at finding information across long contexts, better at reasoning after absorbing that information, and has substantially better expert-level reasoning abilities in general.</p><p></p>Long-context retrievalLong-context reasoning<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fae7ae61aefff3c9b059975957335785f8ebd59d6-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 shows significant improvement in long-context retrieval.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9a32a76a983d4c8f709683b38ff3af6664b5128a-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 excels at deep reasoning across long contexts.<p>Finally, the charts below show how Claude Opus 4.6 performs on a variety of benchmarks that assess its software engineering skills, multilingual coding ability, long-term coherence, cybersecurity capabilities, and its life sciences knowledge.</p>Root cause analysisMultilingual codingLong-term coherenceCybersecurityLife sciences<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F653e04afc43612d3a0f8427da86b6549800005f9-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 excels at diagnosing complex software failures.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F542044519014a793cf042a08a730ebd8977c57b0-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 resolves software engineering issues across programming languages.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F6c1b33e985bcae9163b77bc25620e85abd5d9a7b-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 maintains focus over time and earns $3,050.53 more than Opus 4.5 on Vending-Bench 2.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F8a421f45125743fd9e9078aae992c6e5f236a3da-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 finds real vulnerabilities in codebases better than any other model.<img src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Ff7dff66d47d54dfaabddc82bf9b96658df00634a-3840x2160.png&amp;w=3840&amp;q=75"/>Opus 4.6 performs almost 2× better than Opus 4.5 on computational biology, structural biology, organic chemistry, and phylogenetics tests.<h2>A step forward on safety</h2><p>These intelligence gains do not come at the cost of safety. On our automated behavioral audit, Opus 4.6 showed a low rate of misaligned behaviors such as deception, sycophancy, encouragement of user delusions, and cooperation with misuse. Overall, it is just as well-aligned as its predecessor, Claude Opus 4.5, which was our most-aligned frontier model to date. Opus 4.6 also shows the lowest rate of over-refusals—where the model fails to answer benign queries—of any recent Claude model.</p><img alt="Bar charts comparing Opus 4.6 to other Claude models on overall misaligned behavior" src="https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F569d748607388e6ed42e3ff0ff245d9b0cde6878-3840x2160.png&amp;w=3840&amp;q=75"/>The overall misaligned behavior score for each recent Claude model on our automated behavioral audit (described in full in the <a href="https://www.anthropic.com/claude-opus-4-6-system-card">Claude Opus 4.6 system card</a>).<p>For Claude Opus 4.6, we ran the most comprehensive set of safety evaluations of any model, applying many different tests for the first time and upgrading several that we’ve used before. We included new evaluations for user wellbeing, more complex tests of the model’s ability to refuse potentially dangerous requests, and updated evaluations of the model’s ability to surreptitiously perform harmful actions. We also experimented with new methods from <a href="https://www.anthropic.com/research/team/interpretability">interpretability</a>, the science of the inner workings of AI models, to begin to understand why the model behaves in certain ways—and, ultimately, to catch problems that standard testing might miss.</p><p></p><p>A detailed description of all capability and safety evaluations is available in the <a href="https://www.anthropic.com/claude-opus-4-6-system-card">Claude Opus 4.6 system card</a>.</p><p></p><p>We’ve also applied new safeguards in areas where Opus 4.6 shows particular strengths that might be put to dangerous as well as beneficial uses. In particular, since the model shows enhanced cybersecurity abilities, we’ve developed six new cybersecurity <a href="https://www.anthropic.com/research/next-generation-constitutional-classifiers">probes</a>—methods of detecting harmful responses—to help us track different forms of potential misuse.</p><p>We’re also accelerating the cyber<em>defensive</em> uses of the model, using it to help find and patch vulnerabilities in open-source software (as we describe in our new <a href="https://red.anthropic.com/2026/zero-days/">cybersecurity blog post</a>). We think it’s critical that cyberdefenders use AI models like Claude to help level the playing field. Cybersecurity moves fast, and we’ll be adjusting and updating our safeguards as we learn more about potential threats; in the near future, we may institute real-time intervention to block abuse.</p><h2>Product and API updates</h2><p>We’ve made substantial updates across Claude, Claude Code, and the Claude Developer Platform to let Opus 4.6 perform at its best.</p><p></p><p><strong>Claude Developer Platform</strong></p><p>On the API, we’re giving developers better control over model effort and more flexibility for long-running agents. To do so, we’re introducing the following features:</p><ul><li><strong>Adaptive thinking.</strong> Previously, developers only had a binary choice between enabling or disabling extended thinking. Now, with <a href="https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking">adaptive thinking</a>, Claude can decide when deeper reasoning would be helpful. At the default effort level (high), the model uses extended thinking when useful, but developers can adjust the effort level to make it more or less selective.</li><li><strong>Effort. </strong>There are now four <a href="https://platform.claude.com/docs/en/build-with-claude/effort">effort</a> levels to choose from: low, medium, high (default), and max. We encourage developers to experiment with different options to find what works best.</li><li><strong>Context compaction (beta).</strong> Long-running conversations and agentic tasks often hit the context window. <a href="https://platform.claude.com/docs/en/build-with-claude/compaction">Context compaction</a> automatically summarizes and replaces older context when the conversation approaches a configurable threshold, letting Claude perform longer tasks without hitting limits.</li><li><strong>1M token context (beta).</strong> Opus 4.6 is our first Opus-class model with 1M token context. Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens).</li><li><strong>128k output tokens.</strong> Opus 4.6 supports outputs of up to 128k tokens, which lets Claude complete larger-output tasks without breaking them into multiple requests.</li><li><strong>US-only inference. </strong>For workloads that need to run in the United States, <a href="https://platform.claude.com/docs/en/build-with-claude/data-residency">US-only inference</a> is available at 1.1× token pricing.</li></ul><p><strong>Product updates</strong></p><p>Across Claude and Claude Code, we’ve added features that allow knowledge workers and developers to tackle harder tasks with more of the tools they use every day.</p><p>We’ve introduced <a href="https://code.claude.com/docs/en/agent-teams">agent teams</a> in Claude Code as a research preview. You can now spin up multiple agents that work in parallel as a team and coordinate autonomously—best for tasks that split into independent, read-heavy work like codebase reviews. You can take over any subagent directly using Shift+Up/Down or <a href="https://github.com/tmux/tmux/wiki">tmux</a>.</p><p>Claude now also works better with the office tools you already use. Claude in Excel handles long-running and harder tasks with improved performance, and can plan before acting, ingest unstructured data and infer the right structure without guidance, and handle multi-step changes in one pass. Pair that with Claude in PowerPoint, and you can first process and structure your data in Excel, then bring it to life visually in PowerPoint. Claude reads your layouts, fonts, and slide masters to stay on brand, whether you’re building from a template or generating a full deck from a description. Claude in PowerPoint is now available in research preview for Max, Team, and Enterprise plans.</p><h4>Footnotes</h4><p>[1] Run independently by Artificial Analysis. <a href="https://artificialanalysis.ai/methodology/intelligence-benchmarking#gdpval-aa">See here</a> for full methodological details.</p><p>[2] This translates into Claude Opus 4.6 obtaining a higher score than GPT-5.2 on this eval approximately 70% of the time (where 50% of the time would have implied parity in the scores).</p><ul><li>For GPT-5.2 and Gemini 3 Pro models, we compared the best reported model version in the charts and table.</li><li><strong>Terminal-Bench 2.0</strong>: We report both scores reproduced on our infrastructure and published scores from other labs. All runs used the Terminus-2 harness, except for OpenAI’s Codex CLI. All experiments used 1× guaranteed / 3× ceiling resource allocation and 5–15 samples per task across staggered batches. See system card for details.</li><li><strong>Humanity’s Last Exam</strong>: Claude models run “with tools” were run with web search, web fetch, code execution, programmatic tool calling, context compaction triggered at 50k tokens up to 3M total tokens, max reasoning effort, and adaptive thinking enabled. A domain blocklist was used to decontaminate eval results. See system card for more details.</li><li><strong>SWE-bench Verified:</strong> Our score was averaged over 25 trials. With a prompt modification, we saw a score of 81.42%.</li><li><strong>MCP Atlas: </strong>Claude Opus 4.6 was run with max effort. When run at high effort, it reached an industry-leading score of 62.7%.</li><li><strong>BrowseComp</strong>: Claude models were run with web search, web fetch, programmatic tool calling, context compaction triggered at 50k tokens up to 10M total tokens, max reasoning effort, and no thinking enabled. Adding a multi-agent harness increased scores to 86.8%. See system card for more details.</li><li><strong>ARC AGI 2: </strong>Claude Opus 4.6 was run with max effort and a 120k thinking budget score.</li><li><strong>CyberGym</strong>: Claude models were run on no thinking, default effort, temperature, and <code>top_p</code>. The model was also given a “think” tool that allowed interleaved thinking for multi-turn evaluations.</li><li><strong>OpenRCA</strong>: For each failure case in OpenRCA, Claude receives 1 point if all generated root-cause elements match the ground-truth ones, and 0 points if any mismatch is identified. The overall accuracy is the average score across all failure cases. The benchmark was run on the benchmark author’s harness, graded using their official methodology, and has been submitted for official verification.</li></ul></article> https://www.anthropic.com/news/claude-opus-4-6 News Thu, 05 Feb 2026 00:00:00 +0000 Covering electricity price increases from our data centers https://www.anthropic.com/news/covering-electricity-price-increases As we continue to invest in American AI infrastructure , Anthropic will cover electricity price increases that consumers face from our data centers. <article>Policy<h1>Covering electricity price increases from our data centers</h1>Feb 11, 2026<img alt="Covering electricity price increases from our data centers" src="https://www-cdn.anthropic.com/images/4zrzovbb/website/6457c34fbcb012acf0f27f15a6006f700d0f50de-1000x1000.svg"/><p>As we continue to <a href="https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure">invest in American AI infrastructure</a>, Anthropic will cover electricity price increases that consumers face from our data centers.</p><p></p><p>Training a single frontier AI model will soon require gigawatts of power, and the US AI sector will need at least 50 gigawatts of capacity over the next several years. The country <a href="https://www.anthropic.com/news/build-ai-in-america">needs to build new data centers</a> quickly to maintain its competitiveness on AI and national security—but AI companies shouldn’t leave American ratepayers to pick up the tab.</p><p></p><p>Data centers can raise consumer electricity prices in two main ways. First, connecting data centers to the grid often requires costly new or upgraded infrastructure like transmission lines or substations. Second, new demand tightens the market, pushing up prices. We’re committing to address both. Specifically, we will:</p><p></p><ul><li><strong>Cover grid infrastructure costs</strong>. We will pay for 100% of the grid upgrades needed to interconnect our data centers, paid through increases to our monthly electricity charges. This includes the shares of these costs that would otherwise be passed onto consumers.</li><li><strong>Procure new power and protect consumers from price increases</strong>. We will work to bring net-new power generation online to match our data centers’ electricity needs. Where new generation isn’t online, we’ll work with utilities and external experts to estimate and cover demand-driven price effects from our data centers.</li><li><strong>Reduce strain on the grid</strong>. We’re investing in curtailment systems that cut our data centers’ power usage during periods of peak demand, as well as grid optimization tools, both of which help keep prices lower for ratepayers.</li><li><strong>Invest in local communities. </strong>Our current data center projects will create hundreds of permanent jobs and thousands of construction jobs. We’re also committed to being a responsible neighbor—that means addressing environmental impacts, including deploying water-efficient cooling technologies, and partnering with local leaders on initiatives that share AI’s benefits broadly.</li></ul><p></p><p>Where we work with partners to develop data centers for handling our own workloads, we make these commitments directly. Where we lease capacity from existing data centers, we’re exploring further ways to address our own workloads' effects on prices.</p><p></p><p>Of course, company-level action isn't enough. Keeping electricity affordable also requires systemic change. We support <a href="https://www.anthropic.com/news/build-ai-in-america">federal policies</a>—including permitting reform and efforts to speed up transmission development and grid interconnection—that make it faster and cheaper to bring new energy online for everyone.</p><p></p><p>Done right, AI infrastructure can be a catalyst for the broader energy investment the country needs. These commitments are the beginning of our efforts to address data centers’ impact on energy costs. We have more to do, and we’ll continue to share updates as this work develops.</p><p><br/></p></article> https://www.anthropic.com/news/covering-electricity-price-increases News Wed, 11 Feb 2026 00:00:00 +0000