--- name: marker description: Convert documents (PDF, EPUB, PPTX, DOCX, XLSX, HTML, images) to Markdown/JSON/HTML using marker-pdf with Claude Haiku LLM enhancement for accurate table, math, and form extraction. Use when user needs to extract content from documents, convert PDFs to markdown, or process document files. --- # Marker Document Converter Convert PDF, EPUB, PPTX, DOCX, XLSX, HTML, and image files to clean Markdown/JSON/HTML format using the marker-pdf tool with multimodal LLM enhancement. ## Prerequisites ```bash # Install marker-pdf with full document support uv tool install marker-pdf[full] ``` Requires Python 3.10+ and PyTorch. ## Basic Usage ```bash marker_single "" \ --output_format markdown \ --output_dir "" \ --use_llm \ --llm_service marker.services.claude.ClaudeService \ --claude_model_name claude-haiku-4-5 \ --claude_api_key $ANTHROPIC_API_KEY \ --disable_image_extraction ``` **Note**: `--disable_image_extraction` generates plain text output. Remove this flag if images need to be preserved. ## Output Formats | Format | Description | Use Case | | ---------- | --------------------------------------------------------------------------------- | --------------------------- | | `markdown` | Formatted text with tables, LaTeX equations ($$-fenced), code blocks, image links | General document conversion | | `html` | Semantic HTML with ``, `

`, `

` tags | Web display | | `json` | Hierarchical structure with block types, bounding boxes, section hierarchy | Programmatic processing | | `chunks` | Flattened JSON optimized for RAG | Vector database ingestion | ## CLI Options ### Core Options - `--output_format`: `markdown` (default), `html`, `json`, `chunks` - `--output_dir`: Directory for output files - `--page_range`: Specific pages, e.g., `"0,5-10,20"` ### LLM Enhancement - `--use_llm`: Enable LLM for improved accuracy (tables, forms, math, handwriting) - `--llm_service`: LLM service class (see LLM Services below) - `--block_correction_prompt`: Custom prompt for output refinement ### OCR & Processing - `--force_ocr`: Force OCR on entire document, converts inline math to LaTeX - `--strip_existing_ocr`: Remove existing OCR and re-process - `--redo_inline_math`: Highest quality inline math conversion (use with `--use_llm`) ### Image & Output Control - `--disable_image_extraction`: Skip image extraction (plain text only) - `--paginate_output`: Add page separators to output - `--extract_images`: Enable image extraction (default: true) ### Advanced - `--config_json`: Load configuration from JSON file - `--debug`: Enable diagnostic logging - `--force_layout_block`: Force layout type, e.g., `Table` - `--converter_cls`: Custom converter class ## LLM Services ### Claude (Default) ```bash marker_single document.pdf \ --use_llm \ --llm_service marker.services.claude.ClaudeService \ --claude_api_key $ANTHROPIC_API_KEY \ --claude_model_name claude-haiku-4-5 ``` ### OpenAI ```bash marker_single document.pdf \ --use_llm \ --llm_service marker.services.openai.OpenAIService \ --openai_api_key $OPENAI_API_KEY \ --openai_model gpt-4o ``` ### Ollama (Local) ```bash marker_single document.pdf \ --use_llm \ --llm_service marker.services.ollama.OllamaService \ --ollama_base_url "http://localhost:11434" \ --ollama_model llama3.2-vision ``` ### Google Gemini (Default if no service specified) ```bash export GOOGLE_API_KEY="your-api-key" marker_single document.pdf --use_llm ``` ## Examples ### Convert PDF to Markdown (Plain Text) ```bash marker_single "./docs/report.pdf" \ --output_format markdown \ --output_dir "./docs/" \ --use_llm \ --llm_service marker.services.claude.ClaudeService \ --claude_model_name claude-haiku-4-5 \ --claude_api_key $ANTHROPIC_API_KEY \ --disable_image_extraction ``` ### Convert with Images Preserved ```bash marker_single "./docs/report.pdf" \ --output_format markdown \ --output_dir "./docs/" \ --use_llm \ --llm_service marker.services.claude.ClaudeService \ --claude_model_name claude-haiku-4-5 \ --claude_api_key $ANTHROPIC_API_KEY ``` ### Extract Tables Only ```bash marker_single "./docs/spreadsheet.pdf" \ --use_llm \ --force_layout_block Table \ --converter_cls marker.converters.table.TableConverter \ --output_format json ``` ### Batch Convert Multiple Files ```bash marker /path/to/input/folder --workers 4 ``` ### Using JSON Config File ```bash cat > config.json << EOF { "force_ocr": true, "use_llm": true, "output_format": "markdown", "disable_image_extraction": true, "strip_existing_ocr": true, "redo_inline_math": true } EOF marker_single document.pdf --config_json config.json ``` ## Output Structure ### Markdown Output - Image links: `![](image_name.png)` - Tables: Formatted as markdown tables - Equations: Fenced with `$$...$$` - Code: Fenced with ` ```language ` - Headings: `#` for sections ### JSON Output ```json { "pages": [ { "id": "page_0", "polygon": [[x1,y1], [x2,y2], ...], "children": [ { "id": "block_0", "block_type": "Text|Table|Image|...", "html": "

content

", "polygon": [...], "section_hierarchy": {...} } ] } ], "metadata": { "table_of_contents": [...], "page_stats": [...] } } ``` ## Instructions 1. Confirm the input file path exists 2. Determine output directory (default: same as input file) 3. **Use AskUserQuestion tool** to ask user preferences (ask both questions together): **Question 1 - Image Extraction**: - Header: "Images" - Question: "是否需要提取文档中的图片？" - Options: - "No (Recommended)": 仅提取文本，生成纯 Markdown 文件 - "Yes": 提取图片并保存，Markdown 中包含图片链接 **Question 2 - LLM Service**: - Header: "LLM" - Question: "使用哪个 LLM 来识别图片和表格内容？" - Options: - "Claude Haiku (Recommended)": 快速、经济，需要 ANTHROPIC_API_KEY - "Claude Sonnet": 更高质量，需要 ANTHROPIC_API_KEY - "GPT-4o": OpenAI 模型，需要 OPENAI_API_KEY - "Ollama (Local)": 本地运行，无需 API Key 4. Based on user's answers, construct the command: - If "No" for images: add `--disable_image_extraction` - Set LLM service parameters according to selection: - Claude Haiku: `--llm_service marker.services.claude.ClaudeService --claude_model_name claude-haiku-4-5 --claude_api_key $ANTHROPIC_API_KEY` - Claude Sonnet: `--llm_service marker.services.claude.ClaudeService --claude_model_name claude-sonnet-4-20250514 --claude_api_key $ANTHROPIC_API_KEY` - GPT-4o: `--llm_service marker.services.openai.OpenAIService --openai_api_key $OPENAI_API_KEY --openai_model gpt-4o` - Ollama: `--llm_service marker.services.ollama.OllamaService --ollama_base_url "http://localhost:11434" --ollama_model llama3.2-vision` 5. Run the `marker_single` command with chosen options 6. Report the output file location and any extraction notes