--- id: ins_economic-turing-test-for-ai operator: Benjamin Mann operator_role: Co-founder, Anthropic; tech lead product engineering; ex-OpenAI GPT-3 architect source_url: https://www.lennysnewsletter.com/p/anthropic-co-founder-benjamin-mann source_type: podcast source_title: The Economic Turing Test, mission over money, transformative AI source_date: 2026-04-28 captured_date: 2026-05-01 domain: [ai-native, strategy-bets, gtm] lifecycle: [strategy-bets, attribution-measurement] maturity: frontier artifact_class: metric-model score: { originality: 5, specificity: 5, evidence: 4, transferability: 5, source: 5 } tier: A related: [ins_use-new-tools-as-new-tools, ins_dont-box-the-model-in] raw_ref: raw/podcasts/benjamin-mann--economic-turing-test--2026-04-28.md --- # The Economic Turing Test, would you hire the agent if you didn't know it was a machine? ## Claim The right benchmark for "is this AI system transformative?" is not a generic capability test but the Economic Turing Test: contract an agent for one to three months on a specific job; if you would hire it back, having believed it was a person, it has passed for that role. Aggregate to a money-weighted basket of jobs and call 50% the threshold for transformative AI. ## Mechanism Generic AGI debates are unfalsifiable. The Economic Turing Test grounds the question in revealed-preference: would a real buyer pay a real wage for the agent's output, blind to its origin? The answer is per-role and per-workstream, which makes it actionable for product teams. It also re-orders the conversation from "is the model smart enough?" to "for which specific workstream does this agent already pass?", a question with concrete answers and concrete economic stakes. ## Conditions Holds when: - The role has a market-clearing wage for human labor (a contractor benchmark exists). - The work product is evaluable by the buyer over a meaningful time window. - The buyer is willing to evaluate honestly rather than pattern-match on origin. Fails when: - The role is so novel there is no contractor benchmark. - Evaluation is cheap to game (one-shot tasks the agent can fake at the surface but not maintain). - The buyer's bias against AI overrides their own commercial judgment. ## Evidence > "If you contract an agent for a month or three months on a particular job, if you decide to hire that agent and it turns out to be a machine rather than a person, then it's passed the Economic Turing Test for that role." Reference points Mann anchors against: Fin/Intercom resolves 82% of customer-service tickets fully automated; the remaining 18% is genuinely harder. Anthropic's own engineering reports 95% of Claude Code's code is written by Claude with 10–20x output multiplier. · Benjamin Mann on Lenny's Podcast, 2026-04-28 ## Signals - Specific workstreams cross the threshold and become "agent-default" with humans on review. - Money-weighted percentage of agent-passable workstreams climbs over quarters. - Pricing conversations shift from per-seat to per-outcome (ties into Bret Taylor's outcomes-based pricing thesis). ## Counter-evidence The test isolates economic substitution but ignores trust, legal exposure, and failure-mode novelty. An agent might pass the test for ninety days and then produce a catastrophic compounding error a human would not. Aggregation to a 50% threshold for "transformative AI" is provocative but arbitrary; reasonable operators can disagree about the right denominator and weighting. ## Cross-references - `ins_use-new-tools-as-new-tools`, the operator-side complement - `ins_dont-box-the-model-in`, the architectural prerequisite for passing the test at scale