---
name: academic-poster-generator
description: Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have one or more research paper PDFs and need a conference-style poster draft with standard sections (Intro/Methods/Results/Conclusions).
- You want to automatically extract title/authors/abstract/sections from a PDF and restructure them into concise poster bullet points.
- You need a LaTeX poster scaffold using **beamerposter**, **tikzposter**, or **baposter**, with standard poster sizes (A0/A1/36×48").
- You must ensure posters are **visual-first** (at least 2–3 generated figures) and pass an automated quality gate before delivery.
- You want a **browser-viewable final output** (`poster.rendered.html`) rather than a compiled PDF (PDF compilation is optional).

## Key Features

- **End-to-end pipeline**: PDF → metadata extraction → content structuring → figure generation → LaTeX poster assembly → HTML base conversion → agent rendering → final HTML.
- **Scripted PDF processing**:
  - Convert PDF pages to images for reference.
  - Extract metadata (title, authors, abstract, section structure).
  - Structure content into poster-ready bullet points with word limits.
- **Mandatory figure workflow**:
  - Generate at least **2–3 figures** (schematics/flowcharts/mechanisms/comparison charts).
  - Insert figures into LaTeX templates automatically or via config.
- **Template support**:
  - `assets/templates/beamerposter-template.tex`
  - `assets/templates/tikzposter-template.tex`
  - `assets/templates/baposter-template.tex`
- **HTML-first delivery policy**:
  - Final deliverables: `poster.rendered.html` + `figures/`
  - Intermediate artifacts are temporary and must not be returned.
- **Quality control**:
  - Automated checks for HTML/LaTeX outputs.
  - Manual checklist reference: `references/poster_quality_checklist.md`
- **Design guidance**:
  - Poster design principles: `references/design_principles.md`

## Dependencies

### Python (recommended)
- Python `>=3.8`
- `pypdf` (version not pinned)
- `pdfplumber` (version not pinned)
- `pdf2image` (version not pinned)
- `pytesseract` (version not pinned; required for OCR on scanned PDFs)
- `pandas` (version not pinned; optional for table handling)
- `Pillow` (version not pinned)
- `matplotlib` (version not pinned)

Install (example):
```bash
pip install pypdf pdfplumber pdf2image pytesseract pandas pillow matplotlib
```

### System tools
- `tesseract-ocr` (for OCR; required only for scanned PDFs)
- Poppler utilities (commonly required by `pdf2image`, e.g., `pdftoppm`)

### LaTeX (optional, only if compiling PDF)
- TeX Live / MiKTeX / MacTeX
- TeX packages (via `tlmgr`, names may vary by distribution):
  - `beamerposter`
  - `tikzposter`
  - `baposter`
  - `qrcode`
  - `xcolor`
  - `tcolorbox`
  - `subcaption`
  - `graphics`

Example:
```bash
tlmgr install beamerposter tikzposter baposter qrcode xcolor tcolorbox subcaption graphics
```

### HTML conversion
- `pandoc` (required for `--html-only` base HTML generation)
- `pdf2htmlEX` (optional; only if doing PDF → HTML rendering)

## Example Usage

### 1) Run the end-to-end pipeline (recommended)

Creates a timestamped directory under `output/` and supports HTML-only mode.

```bash
python scripts/run_poster_pipeline.py paper.pdf --html-only
```

Expected final structure:
```text
output/YYYYMMDD_HHMMSS/
├── poster.rendered.html
└── figures/
    ├── mechanism.png
    ├── process_flowchart.png
    └── comparison_chart.png
```

### 2) Run stages manually (fully reproducible)

#### Stage A — Extract metadata and structure content (temporary artifacts)
```bash
python scripts/extract_metadata.py paper.pdf metadata.json
python scripts/structure_content.py paper.pdf --json poster_content.json --latex content.tex --max-words 800
python scripts/convert_pdf_to_images.py paper.pdf paper_images/ --dpi 600
```

#### Stage B — Generate figures (mandatory)
At least **2–3** figures are required.

```bash
python scripts/generate_figures.py schematic "Cell signaling pathway" mechanism.png
python scripts/generate_figures.py flowchart "Step 1;Step 2;Step 3" process_flowchart.png
python scripts/generate_figures.py comparison '{"Control":[65],"Our Method":[87]}' comparison_chart.png
```

Or generate from a config:
```bash
python scripts/generate_figures.py config figures_config.json
```

#### Stage C — Create poster LaTeX from a template (temporary artifact)
```bash
cp assets/templates/beamerposter-template.tex poster.tex
# (Edit poster.tex: title/authors/institute + paste structured content)
```

#### Stage D — Insert figures into LaTeX (temporary artifact)
```bash
python scripts/insert_figures.py poster.tex --all
# or:
python scripts/insert_figures.py poster.tex --config figures_config.json
```

#### Stage E — Convert to base HTML and render final HTML (final artifact)
```bash
python scripts/convert_poster.py poster.tex --html-only
# Agent step: read poster.html, validate <img> paths, inject CSS/layout, output poster.rendered.html
```

#### Stage F — Quality control (mandatory)
```bash
python scripts/check_poster_quality.py poster.rendered.html
```

## Implementation Details

### Pipeline and file policy

All generated files must be placed under `output/` (preferably a timestamped subdirectory). Only the following are considered **final deliverables**:

- `poster.rendered.html`
- `figures/` directory (PNG figures)

All other files are **intermediate/temporary** and must not be returned to the user, including (non-exhaustive):
- `poster.tex`, `poster.html`
- `metadata.json`, `poster_content.json`, `figures_config.json`, `figures.json`
- LaTeX auxiliary files (`.aux`, `.log`, `.out`, etc.)

Conceptual flow:
```text
PDF → metadata extraction (temp) → content structuring (temp) → figure generation (final figures/)
→ LaTeX poster (temp) → base HTML (temp) → agent rendering → poster.rendered.html (final)
```

### PDF extraction approach

- Text and tables are extracted via `pdfplumber`.
- For scanned PDFs, OCR is performed by converting pages to images (`pdf2image`) and running `pytesseract`.

Typical extraction targets:
- Title/authors (first-page patterns)
- Abstract (from “Abstract” header to next section)
- Section blocks (Introduction/Methods/Results/Conclusion)
- Tables (converted to summaries for poster bullets)

### Content structuring rules

- Convert paragraphs to bullet points suitable for poster blocks.
- Enforce a poster-friendly word budget (commonly **600–800 words** total).
- Prefer 3–6 key visuals; summarize tables into a small set of metrics.

### Figure requirements (mandatory)

- Every poster must include **at least 2–3 generated figures**.
- Target **40–50%** of poster area as visual content.
- Figures should be **≥300 DPI**, with clear labels and concise captions.

Supported figure categories:
- Schematics (systems, pathways, conceptual diagrams)
- Flowcharts (procedures, pipelines, algorithms)
- Mechanism diagrams (biological/chemical processes)
- Comparison charts (bar charts, benchmarks)

### LaTeX package selection

Use one of the supported poster packages depending on style needs:

- **beamerposter** (traditional academic)
- **tikzposter** (modern, colorful)
- **baposter** (multi-column layouts)

Templates are provided in `assets/templates/`.

### HTML rendering requirements (agent step)

After `pandoc` produces `poster.html`, the renderer must:
1. Verify all `<img>` references exist and paths resolve to `figures/`.
2. Apply poster-grade CSS (grid columns, typography, spacing, captions).
3. Ensure accessibility: **WCAG AA contrast ≥ 4.5:1**.
4. Output `poster.rendered.html` as the only final HTML artifact.

Minimum layout expectations:
- 2–3 column grid on desktop; single column on narrow screens.
- Clear hierarchy: title/authors/institution, section headers, bullet lists, figure blocks with captions.

### Quality control requirements (mandatory)

Run `scripts/check_poster_quality.py` on the final output:
- Validate that images exist and render correctly.
- Confirm layout integrity (columns/blocks).
- Ensure readability (font sizes, spacing).
- Confirm contrast compliance (≥ 4.5:1).
- Confirm figure count (≥ 2–3) and that no placeholder text remains.

References:
- Design principles: `references/design_principles.md`
- Manual checklist: `references/poster_quality_checklist.md`