--- name: docx-to-markdown description: '[Document Processing] Use when you need to convert Microsoft Word ( DOCX) files to Markdown with GFM support (tables, images, code blocks).' disable-model-invocation: true --- > Codex compatibility note: > > - Invoke repository skills with `$skill-name` in Codex; this mirrored copy rewrites legacy Claude `/skill-name` references. > - Task tracker mandate: BEFORE executing any workflow or skill step, create/update task tracking for all steps and keep it synchronized as progress changes. > - User-question prompts mean to ask the user directly in Codex. > - Ignore Claude-specific mode-switch instructions when they appear. > - Strict execution contract: when a user explicitly invokes a skill, execute that skill protocol as written. > - Subagent authorization: when a skill is user-invoked or AI-detected and its protocol requires subagents, that skill activation authorizes use of the required `spawn_agent` subagent(s) for that task. > - Do not skip, reorder, or merge protocol steps unless the user explicitly approves the deviation first. > - For workflow skills, execute each listed child-skill step explicitly and report step-by-step evidence. > - If a required step/tool cannot run in this environment, stop and ask the user before adapting. ## Codex Project-Reference Loading (No Hooks) Codex does not receive Claude hook-based doc injection. When coding, planning, debugging, testing, or reviewing, open project docs explicitly using this routing. **Always read:** - `docs/project-config.json` (project-specific paths, commands, modules, and workflow/test settings) - `docs/project-reference/docs-index-reference.md` (routes to the full `docs/project-reference/*` catalog) - `docs/project-reference/lessons.md` (always-on guardrails and anti-patterns) **Situation-based docs:** - Backend/CQRS/API/domain/entity changes: `backend-patterns-reference.md`, `domain-entities-reference.md`, `project-structure-reference.md` - Frontend/UI/styling/design-system: `frontend-patterns-reference.md`, `scss-styling-guide.md`, `design-system/README.md` - Spec/test-case planning or TC mapping: `feature-docs-reference.md` - Integration test implementation/review: `integration-test-reference.md` - E2E test implementation/review: `e2e-test-reference.md` - Code review/audit work: `code-review-rules.md` plus domain docs above based on changed files Do not read all docs blindly. Start from `docs-index-reference.md`, then open only relevant files for the task. ## Quick Summary **Goal:** Convert Microsoft Word (.docx) files to Markdown with GFM support (tables, images, formatting). **Workflow:** 1. **Install** -- Ensure pandoc is available (required dependency) 2. **Convert** -- Run pandoc with GFM output format and image extraction 3. **Clean** -- Post-process markdown for consistency **Key Rules:** - Requires pandoc installed on the system - Extracts images to a media/ directory alongside the markdown - Preserves tables, formatting, and document structure **Be skeptical. Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence percentages (Idea should be more than 80%).** # docx-to-markdown Convert Microsoft Word (.docx) files to Markdown format with GitHub-Flavored Markdown support. ## Installation Required **This skill requires npm dependencies.** Run one of the following: ```bash # Option 1: Install via ClaudeKit CLI (recommended) ck init # Runs install.sh which handles all skills # Option 2: Manual installation cd .claude/skills/docx-to-markdown npm install ``` **Dependencies:** `mammoth`, `turndown`, `turndown-plugin-gfm` ## Quick Start ```bash # Basic conversion node .claude/skills/docx-to-markdown/scripts/convert.cjs --input ./document.docx # Specify output path node .claude/skills/docx-to-markdown/scripts/convert.cjs -i ./doc.docx -o ./output.md # Preserve images to folder node .claude/skills/docx-to-markdown/scripts/convert.cjs -i ./doc.docx --images ./images/ ``` ## CLI Options | Option | Short | Description | Default | | ---------- | ----- | ------------------------------ | ------------- | | `--input` | `-i` | Input DOCX file path | (required) | | `--output` | `-o` | Output markdown file path | `{input}.md` | | `--images` | | Directory for extracted images | inline base64 | | `--help` | `-h` | Show help message | | ## Features - **GFM Tables:** Properly converts Word tables to markdown tables - **Images:** Extracts embedded images (base64 inline or to folder) - **Lists:** Ordered and unordered lists preserved - **Code Blocks:** Monospace text converted to code blocks - **Links:** Hyperlinks preserved - **Headings:** Heading levels maintained - **Cross-Platform:** Works on Windows, macOS, Linux ## Conversion Pipeline ``` DOCX → mammoth → HTML → turndown → Markdown ``` The two-stage conversion (DOCX→HTML→MD) follows mammoth's official recommendation for best results. ## Output Returns JSON on success: ```json { "success": true, "input": "/path/to/input.docx", "output": "/path/to/output.md", "stats": { "images": 3, "tables": 2, "headings": 5 } } ``` ## Limitations - Complex layouts (columns, text boxes) may not preserve structure - Merged table cells produce basic markdown tables - Comments and track changes are stripped - Some formatting (fonts, colors) lost in conversion --- > **[IMPORTANT]** Use task tracking to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip. > **AI Mistake Prevention** — Failure modes to avoid on every task: > > **Check downstream references before deleting.** Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. > **Verify AI-generated content against actual code.** AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. > **Trace full dependency chain after edits.** Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. > **Trace ALL code paths when verifying correctness.** Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path. > **When debugging, ask "whose responsibility?" before fixing.** Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site. > **Assume existing values are intentional — ask WHY before changing.** Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. > **Verify ALL affected outputs, not just the first.** Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. > **Holistic-first debugging — resist nearest-attention trap.** When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. > **Surgical changes — apply the diff test.** Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. > **Surface ambiguity before coding — don't pick silently.** If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path. > **Critical Thinking Mindset** — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. > **Anti-hallucination:** Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination. **MUST ATTENTION** apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact. **MUST ATTENTION** apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction. ## Closing Reminders - **MANDATORY IMPORTANT MUST ATTENTION** break work into small todo tasks using task tracking BEFORE starting - **MANDATORY IMPORTANT MUST ATTENTION** search codebase for 3+ similar patterns before creating new code - **MANDATORY IMPORTANT MUST ATTENTION** cite `file:line` evidence for every claim (confidence >80% to act) - **MANDATORY IMPORTANT MUST ATTENTION** add a final review todo task to verify work quality **[TASK-PLANNING]** Before acting, analyze task scope and systematically break it into small todo tasks and sub-tasks using task tracking. ## Hookless Prompt Protocol Mirror (Auto-Synced) Source: `.claude/hooks/lib/prompt-injections.cjs` + `.claude/.ck.json` ## [WORKFLOW-EXECUTION-PROTOCOL] [BLOCKING] Workflow Execution Protocol — MANDATORY IMPORTANT MUST CRITICAL. Do not skip for any reason. **Generic portability boundary:** Reusable skills and protocol text stay project-neutral; project-specific conventions are discovered from docs/project-config.json and docs/project-reference/. Apply shared AI-SDD from `shared/sdd-artifact-contract.md`. Read `docs/project-config.json` and `docs/project-reference/docs-index-reference.md`, then open the project reference docs named there. Any supported AI tool may execute when this shared context and local docs are available. 1. **DETECT:** Match prompt against workflow catalog 2. **ANALYZE:** Find best-match workflow AND evaluate if a custom step combination would fit better 3. **ASK (REQUIRED FORMAT):** Use a direct user question with this structure unless the user explicitly invoked a workflow/skill and the local protocol treats explicit invocation as confirmation: - Question: "Which workflow do you want to activate?" - Option 1: "Activate **[BestMatch Workflow]** (Recommended)" - Option 2: "Activate custom workflow: **[step1 → step2 → ...]**" (include one-line rationale) 4. **ACTIVATE (if confirmed):** Call `$workflow-start ` for standard; sequence custom steps manually 5. **CREATE TASKS:** task tracking for ALL workflow steps 6. **EXECUTE:** Follow each step in sequence **[CRITICAL-THINKING-MINDSET]** Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. **Anti-hallucination principle:** Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination. **AI Attention principle (Primacy-Recency):** Put the 3 most critical rules at both top and bottom of long prompts/protocols so instruction adherence survives long context windows. **Goal-driven execution:** Define success criteria first, loop until verified, and stop only when observable checks pass. **Tests verify intent:** Tests must protect business rules/invariants and fail when the protected intent breaks, not only mirror current behavior. ## [LESSON-LEARNED-REMINDER] [BLOCKING] Task Planning & Continuous Improvement — MANDATORY. Do not skip. Break work into small tasks (task tracking) before starting. Add final task: "Analyze AI mistakes & lessons learned". **Extract lessons — ROOT CAUSE ONLY, not symptom fixes:** 1. Name the FAILURE MODE (reasoning/assumption failure), not symptom — "assumed API existed without reading source" not "used wrong enum value". 2. Generality test: does this failure mode apply to ≥3 contexts/codebases? If not, abstract one level up. 3. Write as a universal rule — strip project-specific names/paths/classes. Useful on any codebase. 4. Consolidate: multiple mistakes sharing one failure mode → ONE lesson. 5. **Recurrence gate:** "Would this recur in future session WITHOUT this reminder?" — No → skip `$learn`. 6. **Auto-fix gate:** "Could `$code-review`/`$code-simplifier`/`$security`/`$lint` catch this?" — Yes → improve review skill instead. 7. BOTH gates pass → ask user to run `$learn`. **[TASK-PLANNING] [MANDATORY]** BEFORE executing any workflow or skill step, create/update task tracking for all planned steps, then keep it synchronized as each step starts/completes.