# LiteParse Node.js Node.js/TypeScript bindings for [LiteParse](https://github.com/run-llama/liteparse) — fast, lightweight PDF and document parsing with spatial text extraction. ## Installation ```bash npm i @llamaindex/liteparse ``` This also installs the `lit` CLI command (use `npm i -g` for global access). ## Quick Start ```typescript import { LiteParse } from '@llamaindex/liteparse'; const parser = new LiteParse(); const result = await parser.parse('document.pdf'); console.log(result.text); // Access structured data for (const page of result.pages) { console.log(`Page ${page.pageNum}: ${page.textItems.length} text items`); } ``` ## Configuration All options are passed to the constructor: ```typescript const parser = new LiteParse({ ocrEnabled: true, // Enable OCR (default: true) ocrLanguage: 'eng', // Tesseract language code ocrServerUrl: undefined, // HTTP OCR server URL (optional) tessdataPath: undefined, // Path to tessdata directory (optional) maxPages: 1000, // Max pages to parse targetPages: '1-5,10', // Specific pages (optional) dpi: 150, // Rendering DPI preserveVerySmallText: false, // Keep tiny text password: undefined, // Password for protected documents quiet: false, // Suppress progress output numWorkers: 4, // Concurrent OCR workers }); ``` ## Parsing from Bytes Pass a `Buffer` or `Uint8Array` directly — useful for HTTP responses or in-memory data: ```typescript import { readFile } from 'fs/promises'; const pdfBytes = await readFile('document.pdf'); const result = await parser.parse(pdfBytes); console.log(result.text); ``` ## Screenshots Generate PNG screenshots of document pages: ```typescript const screenshots = parser.screenshot('document.pdf', [1, 2, 3]); for (const s of screenshots) { console.log(`Page ${s.pageNum}: ${s.width}x${s.height}`); // s.imageBuffer contains PNG bytes } ``` ## Supported Formats - PDF (`.pdf`) - Microsoft Office (`.docx`, `.xlsx`, `.pptx`, etc.) — requires LibreOffice - OpenDocument (`.odt`, `.ods`, `.odp`) — requires LibreOffice - Images (`.png`, `.jpg`, `.tiff`, etc.) — requires ImageMagick - And more! ## CLI The npm package includes the `lit` CLI: ```bash lit parse document.pdf lit parse document.pdf --format json -o output.json lit screenshot document.pdf -o ./screenshots lit batch-parse ./input ./output ```