--- name: codex-peer-review description: [CLAUDE CODE ONLY] Leverage Codex CLI for AI peer review, second opinions on architecture and design decisions, cross-validation of implementations, security analysis, and alternative approach generation. Requires terminal access to execute Codex CLI commands. Use when making high-stakes decisions, reviewing complex architecture, or when explicitly requested for a second AI perspective. Must be explicitly invoked using skill syntax. license: Complete terms in LICENSE.txt environment: claude-code --- # Codex Peer Review Skill 🖥️ **Claude Code Only** - Requires terminal access to execute Codex CLI commands. Enable Claude Code to leverage OpenAI's Codex CLI for collaborative AI reasoning, peer review, and multi-perspective analysis of code architecture, design decisions, and implementations. ## Core Philosophy **Two AI perspectives are better than one for high-stakes decisions.** This skill enables strategic collaboration between Claude Code (Anthropic) and Codex CLI (OpenAI) for: - Architecture validation and critique - Design decision cross-validation - Alternative approach generation - Security, performance, and testing analysis - Learning from different AI reasoning patterns **Not a replacement—a second opinion.** --- ## When to Use Codex Peer Review ### High-Value Scenarios **DO use when:** - Making high-stakes architecture decisions - Choosing between significant design alternatives - Reviewing security-critical code - Validating complex refactoring plans - Exploring unfamiliar domains or patterns - User explicitly requests second opinion - Significant disagreement about approach - Performance-critical optimization decisions - Testing strategy validation **DON'T use when:** - Simple, straightforward implementations - Already confident in singular approach - Time-sensitive quick fixes - No significant trade-offs exist - Low-impact tactical changes - Codex CLI is not available/installed ### How to Invoke This Skill **Important:** This skill requires explicit invocation. It is not automatically triggered by natural language. **To use this skill, Claude must explicitly invoke it using:** ``` skill: "codex-peer-review" ``` **User phrases that indicate this skill would be valuable:** - "Get a second opinion on..." - "What would Codex think about..." - "Review this architecture with Codex" - "Use Codex to validate this approach" - "Are there better alternatives to..." - "Get Codex peer review for this" - "Security review with Codex needed" - "Ask Codex about this design" When these phrases appear, Claude should suggest using this skill and invoke it explicitly if appropriate. --- ### Codex vs Gemini: Which Peer Review Skill? Both Codex and Gemini peer review skills provide valuable second opinions, but excel in different scenarios. **Use Codex Peer Review when:** - Code size < 500 LOC (focused reviews) - Need precise, line-level bug detection - Want fast analysis with concise output - Reviewing single modules or functions - Need tactical implementation feedback - Performance bottleneck identification (specific issues) - Quick validation of design decisions **Use Gemini Peer Review when:** - Code size > 5k LOC (large codebase analysis) - Need full codebase context (up to 1M tokens) - Reviewing architecture across multiple modules - Analyzing diagrams + code together (multimodal) - Want research-grounded recommendations (current best practices) - Cross-module security analysis (attack surface mapping) - Systemic performance patterns - Design consistency checking **For mid-range codebases (500-5k LOC):** - Use **Codex** if: Focused review, single module, speed priority, specific bugs - Use **Gemini** if: Cross-module patterns, holistic view, diagram analysis, research grounding - Consider **Both** for: Critical decisions requiring maximum confidence **For maximum value on high-stakes decisions:** Use both skills sequentially and apply synthesis framework (see references/synthesis-framework.md). --- ## Core Workflow ### 1. Recognize Need for Peer Review **Assess if peer review adds value:** Questions to consider: - Is this a high-stakes decision with significant impact? - Are there multiple valid approaches to consider? - Is the architecture complex or unfamiliar? - Does this involve security, performance, or scalability concerns? - Has the user explicitly requested a second opinion? - Would different AI reasoning perspectives help? **If yes to 2+ questions:** Proceed with peer review workflow --- ### 2. Prepare Context for Codex **Extract and structure relevant information:** Load `references/context-preparation.md` for detailed guidance on: - What code/files to include - How to frame questions effectively - Context boundaries (what to include/exclude) - Expectation setting for output format **Key preparation steps:** 1. **Identify core question:** What specifically do we want Codex to review? 2. **Extract relevant code:** Include necessary files, not entire codebase 3. **Provide context:** Project type, constraints, requirements, concerns 4. **Frame clearly:** Specific questions, not vague requests 5. **Set expectations:** What kind of response we need **Context structure template:** ``` [CONTEXT] Project: [type, purpose] Current situation: [what exists] Constraints: [technical, business, time] [CODE/ARCHITECTURE] [relevant code or architecture description] [QUESTION] [specific question or review request] [EXPECTED OUTPUT] [format: analysis, alternatives, recommendations, etc.] ``` --- ### 3. Invoke Codex CLI **Execute appropriate Codex command:** Load `references/codex-commands.md` for complete command reference. **Common patterns:** **Non-interactive review (recommended):** ```bash cat <<'EOF' | codex exec [prepared context and question here] EOF ``` **Simple one-line review:** ```bash codex exec "Review this code for security issues" ``` **Architecture review with diagram:** ```bash codex --image architecture-diagram.png "Analyze this architecture" ``` **Key flags:** - `exec`: Non-interactive execution streaming to stdout - `--image` / `-i`: Attach architecture diagrams or screenshots - `--full-auto`: Unattended mode (use with caution) **Error handling:** - If Codex CLI not installed, inform user and provide installation instructions - If API limits reached, note limitation and proceed with Claude-only analysis - If Codex returns unclear response, reformulate question and retry once --- ### 4. Synthesize Perspectives **Compare and integrate both AI perspectives:** Load `references/synthesis-framework.md` for detailed synthesis patterns. **Analysis framework:** 1. **Agreement Analysis** - Where do both perspectives align? - What shared concerns exist? - What validates confidence in approach? 2. **Disagreement Analysis** - Where do perspectives diverge? - Why might approaches differ? - What assumptions differ? 3. **Complementary Insights** - What does Codex see that Claude missed? - What does Claude see that Codex missed? - How do perspectives complement each other? 4. **Trade-off Identification** - What trade-offs does each perspective reveal? - Which concerns are prioritized differently? - What constraints drive different conclusions? 5. **Insight Extraction** - What are the key actionable insights? - What alternatives emerge from both perspectives? - What risks are highlighted by either perspective? **Synthesis output structure:** ``` ## Perspective Comparison **Claude's Analysis:** [key points from Claude's initial analysis] **Codex's Analysis:** [key points from Codex's review] **Points of Agreement:** - [shared insights] **Points of Divergence:** - [different perspectives and why] **Complementary Insights:** - [unique value from each perspective] ## Synthesis & Recommendations [integrated analysis incorporating both perspectives] **Recommended Approach:** [action plan based on both perspectives] **Rationale:** [why this approach balances both perspectives] **Remaining Considerations:** [open questions or concerns to address] ``` --- ### 5. Present Balanced Analysis **Deliver integrated insights to user:** **Presentation principles:** - Be transparent about which AI said what - Acknowledge disagreements honestly - Don't force false consensus - Explain reasoning behind each perspective - Give user enough context to make informed decision - Present alternatives clearly - Indicate confidence levels appropriately **When perspectives align:** "Both Claude and Codex agree that [approach] is preferable because [reasons]. This alignment increases confidence in the recommendation." **When perspectives diverge:** "Claude favors [approach A] prioritizing [factors], while Codex suggests [approach B] emphasizing [factors]. This divergence reveals an important trade-off: [explanation]. Consider [factors] to decide which approach better fits your context." **When one finds issues the other missed:** "Codex identified [concern] that wasn't initially apparent. This adds [insight] to our analysis..." --- ## Use Case Patterns Load `references/use-case-patterns.md` for detailed examples of each scenario. ### 1. Architecture Review **Scenario:** Reviewing system design before major implementation **Process:** 1. Document current architecture or proposed design 2. Prepare context: system requirements, constraints, scale expectations 3. Ask Codex: "Review this architecture for scalability, maintainability, and potential issues" 4. Synthesize: Compare architectural concerns and recommendations 5. Present: Integrated architecture assessment with both perspectives **Example question:** "Review this microservices architecture. Are there concerns with service boundaries, data consistency, or deployment complexity?" --- ### 2. Design Decision Validation **Scenario:** Choosing between multiple implementation approaches **Process:** 1. Document the decision point and alternatives 2. Prepare context: requirements, constraints, trade-offs known 3. Ask Codex: "Compare approaches A, B, and C for [criteria]" 4. Synthesize: Create trade-off matrix from both perspectives 5. Present: Clear comparison showing strengths/weaknesses **Example question:** "Should we use event sourcing or traditional CRUD for this domain? Consider complexity, auditability, and team expertise." --- ### 3. Security Review **Scenario:** Validating security-critical code before deployment **Process:** 1. Extract security-relevant code sections 2. Prepare context: threat model, security requirements, compliance needs 3. Ask Codex: "Security review: identify vulnerabilities, attack vectors, and hardening opportunities" 4. Synthesize: Combine security concerns from both analyses 5. Present: Comprehensive security assessment with prioritized issues **Example question:** "Review this authentication implementation. Are there vulnerabilities in session management, token handling, or access control?" --- ### 4. Performance Analysis **Scenario:** Optimizing performance-critical code **Process:** 1. Extract performance-critical sections 2. Prepare context: performance requirements, current bottlenecks, constraints 3. Ask Codex: "Analyze for performance bottlenecks and optimization opportunities" 4. Synthesize: Combine optimization suggestions from both perspectives 5. Present: Prioritized optimization recommendations with trade-offs **Example question:** "This query endpoint is slow under load. Identify bottlenecks in the database access pattern, caching strategy, and N+1 issues." --- ### 5. Testing Strategy **Scenario:** Improving test coverage and quality **Process:** 1. Document current testing approach and coverage 2. Prepare context: critical paths, known gaps, testing constraints 3. Ask Codex: "Review testing strategy and suggest improvements" 4. Synthesize: Combine testing recommendations from both perspectives 5. Present: Comprehensive testing improvement plan **Example question:** "Review our testing approach. Are there coverage gaps, missing edge cases, or better testing strategies for this complex state machine?" --- ### 6. Code Review & Learning **Scenario:** Understanding unfamiliar code or patterns **Process:** 1. Extract relevant code sections 2. Prepare context: what's unclear, specific questions, learning goals 3. Ask Codex: "Explain this code: patterns used, design decisions, potential concerns" 4. Synthesize: Combine explanations and identify patterns both AIs recognize 5. Present: Clear explanation with multiple perspectives on design **Example question:** "Explain this recursive backtracking algorithm. What patterns are used, and are there clearer alternatives?" --- ### 7. Alternative Approach Generation **Scenario:** Stuck on a problem or exploring better approaches **Process:** 1. Document current approach and why it's unsatisfactory 2. Prepare context: problem constraints, what's been tried, goals 3. Ask Codex: "Generate alternative approaches to [problem]" 4. Synthesize: Combine creative alternatives from both perspectives 5. Present: Multiple vetted alternatives with trade-off analysis **Example question:** "We're stuck on real-time conflict resolution for collaborative editing. What alternative CRDT or operational transform approaches could work better?" --- ## Command Reference Load `references/codex-commands.md` for complete command documentation. **Quick reference:** | Use Case | Command Pattern | |----------|----------------| | Simple review | `codex exec "Review this code"` | | Multi-line prompt | `cat <<'EOF' \| codex exec` ... `EOF` | | Review with diagram | `codex --image diagram.png "Analyze this"` | | Interactive mode | `codex "What do you think about..."` | | Resume session | `codex resume --last` | **Non-interactive review (recommended for automation):** ```bash cat <<'EOF' | codex exec [Your structured prompt here] EOF ``` --- ## Integration Points ### With Other Skills **With `concept-forge` skill:** - Forge architectural concepts → Validate with Codex peer review - Use `@builder` and `@strategist` archetypes to prepare questions **With `prose-polish` skill:** - Ensure technical documentation is clear and professional - Polish architecture decision records (ADRs) **With `claimify` skill:** - Map architectural arguments and assumptions - Analyze decision rationale structure ### With Claude Code Workflows **Pre-implementation:** - Use peer review before starting major features - Validate architecture before building **Post-implementation:** - Use peer review to validate completed work - Cross-check refactoring results **During implementation:** - Use peer review when stuck or uncertain - Validate critical decisions in real-time --- ## Quality Signals ### Peer Review is Valuable When: - Both perspectives identify same concerns (high confidence) - Perspectives reveal complementary insights - Trade-offs become clearer through different lenses - Alternative approaches emerge that weren't initially visible - Security or performance concerns are validated independently - User gains clarity on decision through multi-perspective analysis ### Peer Review Needs Refinement When: - Responses are too vague or generic - Question wasn't specific enough - Context was insufficient - Both perspectives say obvious things - No new insights emerge - Codex response misunderstands the question **Action:** Reformulate question with better context and specificity ### Skip Peer Review When: - Codex CLI unavailable and blocking progress - Decision is time-sensitive and low-risk - Approach is straightforward with no trade-offs - User doesn't value second opinion for this decision - Context is too large to prepare efficiently --- ## Best Practices ### Effective Peer Review **DO:** - Frame specific, answerable questions - Provide sufficient context for informed analysis - Use for high-stakes decisions where second opinion adds value - Be transparent about which AI provided which insight - Acknowledge disagreements and explain them - Synthesize perspectives rather than just concatenating them - Give user enough context to make informed decision **DON'T:** - Use for every trivial decision - Ask vague questions without context - Force false consensus when perspectives diverge - Hide which AI said what - Ignore one perspective in favor of the other - Present peer review as authoritative truth - Over-rely on peer review for basic decisions ### Context Preparation **Effective context:** - Focused on specific decision or area of code - Includes relevant constraints and requirements - Provides enough background without overwhelming - Frames clear questions - Sets expectations for output **Ineffective context:** - Dumps entire codebase - No clear question or focus - Missing critical constraints - Vague or overly broad - No guidance on what kind of response is useful ### Question Framing **Good questions:** - "Review this microservices architecture. Are service boundaries well-defined? Any concerns with data consistency or deployment complexity?" - "Compare these three caching strategies for our use case. Consider memory overhead, invalidation complexity, and cold-start performance." - "Security review this authentication flow. Focus on session management, token expiration, and refresh token handling." **Poor questions:** - "Is this code good?" (too vague) - "Review everything" (too broad) - "What do you think?" (no specific focus) --- ## Installation Requirements **Codex CLI must be installed to use this skill.** ### Installation ```bash # Via npm npm i -g @openai/codex # Via Homebrew brew install openai/codex/codex ``` ### Authentication ```bash # Sign in with ChatGPT Plus/Pro/Business/Edu/Enterprise account codex auth login # Or provide API key codex auth api-key [your-api-key] ``` ### Verification ```bash # Verify installation codex --version # Check authentication codex login status ``` --- **If Codex CLI is not available:** 1. Inform user that peer review requires Codex CLI 2. Provide installation instructions 3. Continue with Claude-only analysis if user can't install 4. Note that second opinion isn't available --- ## Configuration **Optional configuration in `~/.codex/config.toml`:** ```toml # Approval mode (suggest|auto|on-failure) ask_for_approval = "suggest" # Sandbox mode (read-only|workspace-write|danger-full-access) sandbox = "read-only" ``` **For peer review, recommended settings:** - `sandbox = "read-only"` for read-only safety - `ask_for_approval = "suggest"` for transparency **Note:** Don't hardcode model names in config. Let Codex CLI use its default (latest) model. --- ## Limitations & Considerations ### Technical Limitations - Requires Codex CLI installation and authentication - Subject to OpenAI API rate limits - May have different context windows than Claude - Responses may vary in quality based on prompt - No real-time communication between AIs (sequential only) ### Philosophical Considerations - Different training data and approaches may lead to different perspectives - Neither AI is objectively "correct"—both offer perspectives - User judgment is ultimate arbiter - Peer review adds time to workflow - Over-reliance on peer review can slow decision-making ### When to Trust Which Perspective **Trust convergence:** - When both AIs agree, confidence increases **Trust divergence:** - Reveals important trade-offs and assumptions - Neither is necessarily "right"—different priorities **Trust specialized knowledge:** - Codex may have different strengths in certain domains - Claude may have different strengths in others - Consider which AI's reasoning aligns better with your context --- ## Example Workflows ### Example: Architecture Decision **User:** "I'm designing a multi-tenant SaaS architecture. Should I use separate databases per tenant or a shared database with row-level security?" **Claude initial analysis:** [Provides analysis of trade-offs] **Invoke peer review:** ```bash cat <<'EOF' | codex exec Review multi-tenant SaaS architecture decision: CONTEXT: - B2B SaaS with 100-500 tenants expected - Varying data volumes per tenant (small to large) - Strong data isolation requirements - Team familiar with PostgreSQL - Cloud deployment (AWS) OPTIONS: A) Separate database per tenant B) Shared database with row-level security (RLS) QUESTION: Analyze trade-offs for scalability, operational complexity, data isolation, and cost. Which approach is recommended for this context? EOF ``` **Synthesis:** Compare Claude's and Codex's trade-off analysis, extract key insights, present balanced recommendation. --- ## Anti-Patterns **Don't:** - Use peer review for every trivial decision (wastes time) - Blindly follow one AI's recommendation over the other - Ask vague questions without context - Expect perfect agreement between AIs - Force implementation when both AIs raise concerns - Use peer review as decision-avoidance mechanism - Over-engineer simple problems by seeking too many opinions **Do:** - Use strategically for high-stakes decisions - Synthesize both perspectives thoughtfully - Frame clear, specific questions with context - Embrace disagreement as revealing trade-offs - Use peer review to inform, not replace, judgment - Make timely decisions based on integrated analysis - Balance peer review with velocity --- ## Success Metrics **Peer review succeeds when:** - User gains clarity on decision through multi-perspective analysis - Important trade-offs are revealed that weren't initially apparent - Alternative approaches emerge that are genuinely valuable - Risks are identified by at least one AI perspective - User makes more informed decision than without peer review - Confidence increases (when perspectives align) - Trade-offs become explicit (when perspectives diverge) **Peer review fails when:** - No new insights emerge (obvious analysis) - Takes too long relative to decision impact - Perspectives are confusing rather than clarifying - User is more confused after peer review than before - Blocks forward progress unnecessarily - Becomes crutch for simple decisions --- ## Skill Improvement **This skill improves through:** - Better question framing patterns - More effective context preparation - Refined synthesis techniques - Pattern recognition for when peer review adds value - Learning which types of questions work best with Codex - Understanding Codex's strengths and limitations - Calibrating when peer review is worth the time investment **Feedback loop:** - Track which peer reviews provided valuable insights - Note which question patterns work well - Identify scenarios where peer review was or wasn't valuable - Refine use case patterns based on experience --- ## Related Resources - Codex CLI Documentation: https://developers.openai.com/codex/cli/ - Architecture Decision Records (ADR) patterns - Design pattern catalogs - Security review checklists - Performance optimization frameworks - Testing strategy guides