--- id: ins_bottleneck-is-context-not-capability operator: Sherwin Wu operator_role: Head of Engineering, OpenAI API and Developer Platform source_url: https://www.lennysnewsletter.com/p/engineers-are-becoming-sorcerers source_type: podcast source_title: Sherwin Wu โ€” Codex inside OpenAI, engineers as managers โ€” Lenny's Podcast source_date: 2026-04-28 captured_date: 2026-05-01 domain: [ai-native, engineering, product] lifecycle: [ai-workflow] maturity: applied artifact_class: framework score: { originality: 4, specificity: 5, evidence: 5, transferability: 5, source: 5 } tier: A related: [ins_personal-pattern-hoarding, ins_llm-wiki-pattern] raw_ref: raw/podcasts/sherwin-wu--openai-codex-engineers-as-managers--2026-04-28.md --- # When the agent isn't doing what you want, fix the context, not the model ## Claim The dominant failure mode for AI agents in production isn't model capability. It's missing context. The fix is to encode tribal knowledge into greppable artifacts, markdown files, code comments, .md skills, a wiki, so the agent can find and apply the knowledge it needs. The team's quality is gated by contextDB completeness, not model strength. ## Mechanism Frontier models are competent at general reasoning. Their failure modes appear when they lack specifics: how this codebase is organized, what conventions the team follows, what previous decisions ruled out. Each missing piece of context produces a wrong answer. Encoding the context once compounds: every future task starts with the right priors. The investment shifts from prompt engineering (volatile) to context engineering (durable). ## Conditions Holds when: - The team can identify and document tribal knowledge before each agent task fails on it. - The context corpus is greppable / retrievable, not buried in a wiki nobody updates. Fails when: - The model genuinely lacks capability for the task. No amount of context fixes a missing capability. - The corpus is wrong. Encoded misinformation is worse than no encoding. ## Evidence > "When the agent isn't doing what you want, it's usually a problem with context โ€” you've underspecified or there's just not enough information available." OpenAI's discipline: encode tribal knowledge via comments, structure, .md files, Skills. The repository itself becomes the contextDB. 95% of OpenAI engineers use Codex daily; the heavy users open 70% more PRs; the gap widens with experience. ยท Sherwin Wu on Lenny's Podcast, 2026-04-28 ## Signals - Agent failures get diagnosed against the context corpus first, not the prompt. - The corpus grows in step with team size; documentation is daily work, not a quarterly clean-up. - New team members onboard via the corpus, the same way an agent does. ## Counter-evidence Andrej Karpathy's LLM-wiki pattern is the disciplined version of this approach. Without the discipline (append-only raw, synthesized pages, lint cycles), the corpus rots and degrades agent quality. "Encode tribal knowledge" is correct; "encode it well enough that it stays current" is the actual hard work. ## Cross-references - `ins_llm-wiki-pattern`, Karpathy's substrate pattern that operationalizes this insight - `ins_personal-pattern-hoarding`, Simon Willison's personal-corpus version