--- name: word-read-write description: Create and read Microsoft Word (.docx) documents. Use this skill when you need to generate reports/letters/templates as .docx or extract readable text from existing .docx files. license: MIT author: aipoch --- > **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills) ## When to Use - Generate Word reports (e.g., weekly status, audit summaries) from structured data in Node.js. - Produce standardized memos/letters/templates with consistent page size, margins, headings, and tables. - Export documents that must render reliably across Microsoft Word and Google Docs (tables, shading, widths). - Insert images, headers/footers, page numbers, page breaks, and a table of contents programmatically. - Extract text from existing `.docx` files for indexing, review, or conversion to Markdown. ## Key Features - Create `.docx` documents using the `docx` (docx-js) library. - Explicit page setup (US Letter vs A4), margins, and landscape handling. - Style control (default font, overriding built-in Heading styles for TOC compatibility). - Proper list generation (bullets/numbering via numbering config, not Unicode characters). - Robust table rendering rules (DXA widths, column widths + cell widths, shading, borders, padding). - Images with required metadata and transformations. - Page breaks, headers/footers, and page numbering. - Table of contents generation based on Word heading levels. - Read/extract `.docx` content via Pandoc conversion. ## Dependencies - **Node.js**: >= 18 - **docx**: `^9.0.0` - **pandoc**: `>= 2.0` (CLI tool for extracting/converting `.docx` content) ## Example Usage ### 1) Create a `.docx` (Node.js) **Install** ```bash npm i docx ``` **create-doc.js** ```javascript const fs = require("fs"); const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun, Header, Footer, AlignmentType, BorderStyle, WidthType, ShadingType, PageOrientation, PageNumber, PageBreak, TableOfContents, HeadingLevel, LevelFormat, } = require("docx"); const US_LETTER = { width: 12240, height: 15840 }; // DXA (1440 = 1 inch) const MARGINS_1IN = { top: 1440, right: 1440, bottom: 1440, left: 1440 }; const CONTENT_WIDTH = US_LETTER.width - MARGINS_1IN.left - MARGINS_1IN.right; // 9360 const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" }; const borders = { top: border, bottom: border, left: border, right: border }; const doc = new Document({ styles: { default: { document: { run: { font: "Arial", size: 24 }, // 12pt }, }, paragraphStyles: [ // Use exact IDs to override built-in Word heading styles { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 32, bold: true, font: "Arial" }, paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 }, }, { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 28, bold: true, font: "Arial" }, paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 }, }, ], }, numbering: { config: [ { reference: "bullets", levels: [ { level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT, style: { paragraph: { indent: { left: 720, hanging: 360 } } }, }, ], }, { reference: "numbers", levels: [ { level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT, style: { paragraph: { indent: { left: 720, hanging: 360 } } }, }, ], }, ], }, sections: [ { properties: { page: { size: { width: US_LETTER.width, height: US_LETTER.height, // For landscape: pass portrait dimensions and set orientation; docx swaps internally. // orientation: PageOrientation.LANDSCAPE, }, margin: MARGINS_1IN, }, }, headers: { default: new Header({ children: [new Paragraph({ children: [new TextRun("Sample Header")] })], }), }, footers: { default: new Footer({ children: [ new Paragraph({ children: [ new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] }), ], }), ], }), }, children: [ new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("DOCX Creation Example")], }), new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3", }), new Paragraph({ heading: HeadingLevel.HEADING_2, children: [new TextRun("Lists")], }), new Paragraph({ numbering: { reference: "bullets", level: 0 }, children: [new TextRun("Bullet item")], }), new Paragraph({ numbering: { reference: "numbers", level: 0 }, children: [new TextRun("Numbered item")], }), new Paragraph({ heading: HeadingLevel.HEADING_2, children: [new TextRun("Table")], }), new Table({ width: { size: CONTENT_WIDTH, type: WidthType.DXA }, columnWidths: [CONTENT_WIDTH / 2, CONTENT_WIDTH / 2], rows: [ new TableRow({ children: [ new TableCell({ borders, width: { size: CONTENT_WIDTH / 2, type: WidthType.DXA }, shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, margins: { top: 80, bottom: 80, left: 120, right: 120 }, children: [new Paragraph("Left cell")], }), new TableCell({ borders, width: { size: CONTENT_WIDTH / 2, type: WidthType.DXA }, shading: { fill: "FFFFFF", type: ShadingType.CLEAR }, margins: { top: 80, bottom: 80, left: 120, right: 120 }, children: [new Paragraph("Right cell")], }), ], }), ], }), new Paragraph({ heading: HeadingLevel.HEADING_2, children: [new TextRun("Image")], }), new Paragraph({ children: [ new ImageRun({ type: "png", data: fs.readFileSync("./image.png"), transformation: { width: 200, height: 150 }, altText: { title: "Title", description: "Desc", name: "Name" }, }), ], }), new Paragraph({ children: [new PageBreak()] }), new Paragraph({ heading: HeadingLevel.HEADING_2, children: [new TextRun("New Page Section")], }), // Do not use "\n" inside TextRun; create separate Paragraphs instead. new Paragraph({ children: [new TextRun("This is on a new page.")] }), ], }, ], }); (async () => { const buffer = await Packer.toBuffer(doc); fs.writeFileSync("output.docx", buffer); console.log("Wrote output.docx"); })(); ``` **Run** ```bash node create-doc.js ``` ### 2) Read/extract `.docx` text with Pandoc ```bash pandoc document.docx -o output.md ``` ## Implementation Details - **DOCX structure**: `.docx` is a ZIP archive containing XML parts; libraries generate the required XML and package it. - **Page sizing (DXA)**: - Word uses **DXA** units where **1440 DXA = 1 inch**. - docx-js defaults to **A4**; for US documents set **US Letter** explicitly: `12240 x 15840`. - With 1" margins, **content width** is `12240 - 1440 - 1440 = 9360 DXA`. - **Landscape orientation**: - docx-js swaps width/height internally for landscape. - Provide portrait dimensions and set `orientation: PageOrientation.LANDSCAPE`. - **Headings and TOC**: - TOC generation requires paragraphs using `HeadingLevel.*`. - To override built-in heading appearance, use exact style IDs: `"Heading1"`, `"Heading2"`, etc. - Include `outlineLevel` (H1 = 0, H2 = 1, ...), otherwise TOC may not pick up headings. - **Paragraphs vs newlines**: - Do not embed `\n` in runs to simulate line breaks; create separate `Paragraph` elements. - **Lists**: - Do not manually insert Unicode bullet characters as plain text. - Use `numbering.config` with `LevelFormat.BULLET` / `LevelFormat.DECIMAL`. - Numbering continuity depends on `reference`: same reference continues; different reference restarts. - **Tables (critical rendering rules)**: - Always use `WidthType.DXA` (percent widths can break in Google Docs). - Set **table `width`** and **`columnWidths`**; the sum of `columnWidths` must equal table width. - Set **each cell `width`** to match its corresponding column width (required for consistent rendering). - Cell `margins` are internal padding and do not add to width; they reduce usable content area. - For shading, prefer `ShadingType.CLEAR` to avoid unexpected black backgrounds. - **Images**: - `ImageRun` requires a valid `type` (e.g., `png`, `jpg`) and `altText` fields. - **Page breaks**: - `PageBreak` must be inside a `Paragraph` (or use `pageBreakBefore: true`).