# PDF Skill (Read • Inspect • Extract • Edit • Render • Forms • OCR • Redact • Convert • Diff) This skill is designed for **reliable, workflow-driven** PDF work: **render -> verify -> operate -> re-render verify**. ## Before you touch PDFs: should this be DOCX/PPTX instead? Even if the user asks for a PDF deliverable, the best workflow is often: - **Text-heavy, business-doc layout (headings, TOC, long tables, rich lists)** -> use the **DOCX skill** to author, then convert to PDF with `lo_convert_to_pdf.py`. - **Slide-like visual layout (charts, callouts, fixed positioning, figure captions)** -> use the **Slides skill** (PPTX) to author, then export to PDF. - **Programmatic generation** -> ReportLab (this skill) is fine. If you find yourself hand-tuning line breaks or typography in ReportLab, you probably picked the wrong authoring format. --- ## Core loop (always) 1) Render to images ```bash python /home/oai/skills/pdfs/scripts/render_pdf.py input.pdf --out_dir /mnt/data/_renders/in --dpi 200 ``` 2) Inspect PNGs (tables/figures/layout are authoritative) 3) Perform the edit/extract/create 4) Re-render and compare ```bash python /home/oai/skills/pdfs/scripts/compare_renders.py before.pdf after.pdf --out_dir /mnt/data/_diff --dpi 200 ``` --- ## Design standards for document generation For generating new documents or major rewrite/repackages, follow the design standards below unless the user explicitly requests otherwise. The user's instructions always take precedence; otherwise, adhere to these standards. When creating the document design, do not compromise on the content and make factual/technical errors. Do not produce something that looks polished but not actually what the user requested. It is very important that the document is professional and aesthetically pleasing. As such, you should follow this general workflow to make your final delivered document: 1. Before you make the PDF, please first think about the high-level design of the PDF: - Before creating the document, decide what kind of document it is (for example, a memo, report, SOP, workflow, form, proposal, or manual) and design accordingly. In general, you shall create documents which are professional, visually polished, and aesthetically pleasing. However, you should also calibrate the level of styling to the document's purpose: for formal, serious, or highly utilitarian documents, visual appeal should come mainly from strong typography, spacing, hierarchy, and overall polish rather than expressive styling. The goal is for the document's visual character to feel appropriate to its real-world use case, with readability and usability always taking priority. - You should make documents that feel visually natural. If a human looks at your document, they should find the design natural and smooth. This is very important; please think carefully about how to achieve this. - Think about how you would like the first page to be organized. How about subsequent pages? What about the placement of the title? What does the heading ladder look like? Should there be a clear hierarchy? etc - Would you like to include visual components, such as tables, callouts, checklists, images, etc? If yes, then plan out the design for each component. - Think about the general spacing and layout. What will be the default body spacing? What page budget is allocated between packaging and substance? How will page breaks behave around tables and figures, since we must make sure to avoid large blank gaps, keep captions and their visuals together when possible, and keep content from becoming too wide by maintaining generous side margins so the page feels balanced and natural. - Think about font, type scale, consistent accent treatment, etc. Try to avoid forcing large chunks of small text into narrow areas. When space is tight, adjust font size, line breaks, alignment, or layout instead of cramming in more text. 2. Once you have a working PDF, continue iterating until the entire document is polished and correct. After every change or edit, render the PDF and review it carefully to evaluate the result. The plan from (1) should guide you, but it is only a flexible draft; you should update your decisions as needed throughout the revision process. Important: each time you render and reflect, you should check for both: 1. Design aesthetics: the document should be aesthetically pleasing and easy to skim. Ask yourself: if a human were to look at my document, would they find it aesthetically nice? It should feel natural, smooth, and visually cohesive. 2. Formatting issues that need to be fixed: e.g. text overlap, overflow, cramped spacing between adjacent elements, awkward spacing in tables/charts, awkward page breaks, etc. This is super important. Do not stop revising until all formatting issues are fixed. While making and revising the PDF, please adhere to and check against these quality reminders, to ensure the deliverable is visually high quality: - Document density: Try to avoid having verbose dense walls of text, unless it's necessary. Avoid long runs of consecutive plain paragraphs or too many words before visual anchors. For some tasks this may be necessary (i.e. verbose legal documents); in those cases ignore this suggestion. - Font: Use professional, easy-to-read font choices with appropriate size that is not too small. Usage of bold, underlines, and italics should be professional. - Color: Use color intentionally for titles, headings, subheadings, and selective emphasis so important information stands out in a visually appealing way. The palette and intensity should fit the document's purpose, with more restrained use where a formal or serious tone is needed. - Visuals: Consider using tables, diagrams, and other visual components when they improve comprehension, navigation, or usability. - Tables: Please invest significant effort to make sure your tables are well-made and aesthetically/visually good. Below are some suggestions, as well as some hard constraints that you must relentlessly check to make sure your table satisfies them. - Suggestions: - Set deliberate table/cell widths and heights instead of defaulting to full page width. - Choose column widths intentionally rather than giving every column equal width by default. Very short fields (for example: item number, checkbox, score, result, year, date, or status) should usually be kept compact, while wider columns should be reserved for longer content. - Avoid overly wide tables, and leave generous side margins so the layout feels natural. - Keep all text vertically centered and make deliberate horizontal alignment choices. - Ensure cell height avoids a crowded look. Leave clear vertical spacing between a table and its caption or following text. - Hard constraints: - To prevent clipping/overflow: - Never use fixed row heights that can truncate text; allow rows to expand with wrapped content. - Ensure cell padding and line spacing are sufficient so descenders/ascenders don't get clipped. - If content is tight, prefer (in order): wrap text -> adjust column widths -> reduce font slightly -> abbreviate headers/use two-line headers. - Padding / breathing room: Ensure text doesn't sit against cell borders or look "pinned" to the upper-left. Favor generous internal padding on all sides, and keep it consistent across the table. - Vertical alignment: In general, you should center your text vertically. Make sure that the content uses the available cell space naturally rather than clustering at the top. - Horizontal alignment: Do not default all body cells to top-left alignment. Choose horizontal alignment intentionally by column type: centered alignment often works best for short values, status fields, dates, numbers, and check indicators; left alignment is usually better for narrative or multi-line text. - Line height inside cells: Use line spacing that avoids a cramped feel and prevents ascenders/descenders from looking clipped. If a cell feels tight, adjust wrapping/width/padding before shrinking type. - Width + wrapping sanity check: Avoid default equal-width columns when the content in each column clearly has different sizes. Avoid lines that run so close to the right edge that the cell feels overfull. If this happens, prefer wrapping or column-width adjustments before reducing font size. - Spacing around tables: Keep clear separation between tables and surrounding text (especially the paragraph immediately above/below) so the layout doesn't feel stuck together. Captions and tables should stay visually paired, with deliberate spacing. - Quick visual QA pass: Look for text that appears "boundary-hugging", specifically content pressed against the top or left edge of a cell or sitting too close beneath a table. Also watch for overly narrow descriptive columns and short-value columns whose contents feel awkwardly pinned. Correct these issues through padding, alignment, wrapping, or small column-width adjustments. - Forms / questionnaires: Design these as a usable form, not a spreadsheet. - Prioritize clear response options, obvious and well-sized check targets, readable scale labels, generous row height, clear section hierarchy, light visual structure. Please size fields and columns based on the content they hold rather than by equal-width table cells. - Use spacing, alignment, and subtle header/section styling to organize the page. Avoid dense full-grid borders, cramped layouts, and ambiguous numeric-only response areas. - Coherence vs. fragmentation: In general, try to keep things to be one coherent representation rather than fragmented, if possible. - For example, don't split one logical dataset across multiple independent tables unless there's a clear, labeled reason. - For example, if a table must span across pages, continue to the next page with a repeated header and consistent column order - Background shapes/colors: Where helpful, consider section bands, note boxes, control grids, or other visual containers with suitable colors to improve scanability and communication. Use them when they suit the document type. If you do use these, make sure they are formatted well, with no overlaps, awkward spacing, etc. - Spacing: Please check rigorously for spacing issues. Please always use a natural amount of spacing between adjacent components. Use clear, generous vertical spacing between sections and paragraphs, and leave a bit of extra space between subheadings and the content that follows when it improves readability. Use indentation and alignment intentionally so the document's hierarchy is immediately clear. At the same time, avoid large "layout gaps" caused by a table or chart not fitting at the bottom of a page and getting pushed to the next one. If this happens, please try these suggestions: - moving the preceding paragraph(s) with it to the next page to keep the narrative cohesive - scaling the visual modestly or simplify labels without hurting readability, formatting, or aesthetics of the visual - Splitting the table/figure cleanly across multiple pages, but use repeated headers to make the page continuation clear. - Text boxes: For text boxes, please follow the same breathing-room rules as the tables: make sure to use generous internal padding, intentional alignment, and sufficient line spacing so text never feels cramped, clipped, or pinned to the edges. Keep spacing around the text box clear so it remains visually distinct from surrounding content, and if the content feels tight, prefer adjusting box size, padding, or text wrapping before reducing font size. - Layout/archetype: Remember to choose the right document archetype/template (proposal, SOP, workflow, form, handbook, etc.). Use a coherent style system. Once a style system is chosen, apply it consistently across headings, spacing, table treatments, callouts, and accent usage. If appropriate to the document type, include a cover page or front-matter elements such as title, subtitle, metadata, or branding. ## Task index (progressive) Start with the smallest task that answers the user: ### Read / review - `tasks/read_review.md` ### Extract (text/layout/tables/images/attachments/forms) - `tasks/extract.md` - `tasks/coords.md` (coordinate sanity) ### Edit (merge/split/rotate/crop/watermark/paginate/encrypt/repair) - `tasks/edit.md` - `tasks/compare.md` (visual regression) ### Forms - Fillable forms: `tasks/forms_annotations.md` - Debugging/introspection: `tasks/forms_debugging.md` - Non-fillable / stamping workflow: `tasks/forms_nonfillable.md` ### OCR - `tasks/ocr.md` ### Preflight / normalize - `tasks/preflight.md` ### Redaction - `tasks/redact.md` ### Renderer parity - `tasks/parity.md` ### Batch processing - `tasks/batch.md` ### Create / convert - `tasks/create.md` - `tasks/convert.md` - `tasks/js_tools.md` (pdf-lib, pdfjs) --- ## Package map (where things live) This pack includes a `manifest.txt` that is a **pure list of relative file paths** used by download tooling. Quick map: - **tasks/** (what to do) - `read_review.md` - render-first reading/review - `extract.md` - extract text/layout/tables/images/attachments/forms - `coords.md` - coordinate system cheatsheet (PDF pt vs image px) - `edit.md` - merge/split/select/rotate/crop/watermark/paginate/encrypt/repair - `compare.md` - visual diff workflow - `forms_annotations.md` - fillable forms + appearance pitfalls + correctness checklist - `forms_debugging.md` - widget-level introspection + acceptable values - `forms_nonfillable.md` - stamp-by-boxes workflow for non-fillable forms - `ocr.md` - OCR scanned PDFs to searchable - `preflight.md` - quick triage + normalization guidance - `redact.md` - true redaction workflows - `parity.md` - render parity across engines - `batch.md` - batch helpers for corpora - `create.md` - choose reportlab/latex/html/docx/pptx pipeline - `convert.md` - docx/pptx/html/markdown/latex to PDF conversion - `js_tools.md` - pdf-lib/pdfjs helper CLIs - **scripts/** (run these) - `render_pdf.py` - render to PNGs (pdfium or poppler) - `compare_renders.py` - render-and-diff two PDFs (pixel diff) - `pdf_inspect.py` - metadata/structure overview - `pdf_extract.py` - text/words/chars/tables/images/attachments/annots/forms - `pdf_edit.py` - editing toolkit (merge/split/select/rotate/crop/watermark/paginate/encrypt/repair/optimize) - `pdf_preflight.py` - preflight/triage warnings - `pdf_redact.py` - true redaction (remove underlying content) - `renderer_parity.py` - diff pdftoppm vs pdfium renders - `batch_pdf.py` - batch runner for common ops - `box_picker_html.py` - generate interactive HTML to pick rectangles -> JSON in PDF coords - `place_text_by_boxes.py` - stamp text/checkmarks into rectangles (non-fillable forms) - `ocr_pdf.py` - OCR wrapper - `html_to_pdf.py`, `md_to_pdf.py`, `latex_to_pdf.py`, `lo_convert_to_pdf.py` - conversion helpers - **js/** (Node helpers) - `install_deps.sh` - installs pdf-lib + pdfjs-dist - `fill_form.mjs` - fill + optional flatten (supports flags and positional args) - `extract_form_fields.mjs` - list AcroForm fields - `extract_text_pdfjs.mjs` - extract text via pdfjs-dist - **examples/** - `smoke_test.md` - runnable smoke flows - **troubleshooting/** - `common.md` - common pitfalls and fixes --- ## Final deliverable expectations - No clipped text, overlaps, black squares, or broken glyphs in rendered PNGs. - Verify in at least **one** renderer (`pdfium` or `pdftoppm`). For tricky forms, verify in **two**. - Remove intermediate artifacts from the deliverable folder (keep only final PDF(s)). - Avoid Unicode dashes that some renderers mishandle; prefer ASCII `-`.