--- name: ai-html-generate description: Use AI to recreate PDF page as semantic HTML. Consumes three inputs (PNG image, parsed text, ASCII preview) for complete contextual understanding and accurate generation. --- # AI HTML Generate Skill ## Purpose This skill leverages **AI's probabilistic generation capabilities** to recreate PDF pages as semantic HTML. The AI receives three complementary inputs that together provide complete context: 1. **Visual reference** (PNG image) - Page layout and visual hierarchy 2. **Text data** (rich_extraction.json) - Accurate text content and formatting metadata 3. **Structural preview** (ASCII text) - Logical layout and element relationships This **three-input approach** ensures the AI understands not just what text to include, but how it should be structured semantically in HTML. The output is **probabilistic** (AI-generated), but will be made **deterministic** by validation gates in subsequent skills. ## What to Do 1. **Prepare three input files** - Load `02_page_XX.png` (image file) - Load `01_rich_extraction.json` (text spans with metadata) - Load `03_page_XX_ascii.txt` (structure preview) 2. **Construct AI prompt** - Attach PNG image as visual reference - Include extracted text data (JSON) - Include ASCII preview (text representation) - Provide specific generation requirements 3. **Invoke Claude API** with complete context - Send multi-modal prompt (text + image) - Request semantic HTML5 output - Specify CSS classes and structure requirements 4. **Parse and save generated HTML** - Extract HTML from AI response - Validate basic well-formedness - Save to persistent file with metadata 5. **Log generation metadata** - Record AI model used - Timestamp generation - List input files used - Store any confidence indicators from AI ## Input Files (From Previous Skills) ### Input 1: Rendered PDF Page (PNG) **File**: `output/chapter_XX/page_artifacts/page_YY/02_page_XX.png` - High-resolution rendering of PDF page - 300+ DPI for visual clarity - Shows actual page appearance - Used for visual layout understanding ### Input 2: Rich Extraction Data (JSON) **File**: `output/chapter_XX/page_artifacts/page_YY/01_rich_extraction.json` - Text spans with complete metadata - Font names, sizes, bold/italic flags - Position information (bounding boxes) - Sequence and relationships ### Input 3: ASCII Preview (Text) **File**: `output/chapter_XX/page_artifacts/page_YY/03_page_XX_ascii.txt` - Text-based structural representation - Heading hierarchy marked - Lists and bullets identified - Paragraph flow documented - Element types annotated ## AI Prompt Template The prompt sent to Claude: ``` You are recreating a PDF textbook page as semantic HTML5. You have three pieces of information about this page: 1. A visual rendering (PNG image) - to understand layout 2. Parsed text data (JSON) - to ensure accuracy 3. An ASCII structure preview (text) - to understand element relationships VISUAL REFERENCE: [PNG Image Attached] PARSED TEXT DATA: [JSON Attached] STRUCTURAL PREVIEW: [ASCII Text Attached] TASK: Generate semantic HTML5 that accurately recreates this page. REQUIREMENTS: 1. HTML5 Structure: - Proper DOCTYPE, html, head, body tags - Meta charset="UTF-8" - Meta viewport for responsive design - Title tag with descriptive text 2. Content Wrapper: - Single
wrapper - Single
for all content - No page breaks or paginated structure 3. Semantic HTML Elements: - Use proper heading tags (h1-h6) based on hierarchy - Use

for paragraphs - Use