---
name: pdf-extractor
description: "Extract text, tables, and images from PDFs. Use when: extracting data from reports; converting PDF tables to CSV; pulling images from presentations; processing research papers; batch converting PDFs to text"
license: MIT
metadata:
  author: ClawFu
  version: 1.0.0
  mcp-server: "@clawfu/mcp-skills"
---

# PDF Extractor

> Extract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.

## When to Use This Skill

- **Report processing** - Extract data from PDF reports
- **Table extraction** - Convert PDF tables to CSV
- **Image collection** - Pull images from presentations
- **Text mining** - Bulk convert PDFs to searchable text
- **Research** - Process academic papers and whitepapers


## What Claude Does vs What You Decide

| Claude Does | You Decide |
|-------------|------------|
| Structures analysis frameworks | Metric definitions |
| Identifies patterns in data | Business interpretation |
| Creates visualization templates | Dashboard design |
| Suggests optimization areas | Action priorities |
| Calculates statistical measures | Decision thresholds |

## Dependencies

```bash
pip install pdfplumber pypdf click pandas
# For image extraction:
pip install Pillow
```

## Commands

### Extract Text
```bash
python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5
```

### Extract Tables
```bash
python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3
```

### Extract Images
```bash
python scripts/main.py images presentation.pdf --output ./images/
```

### Merge PDFs
```bash
python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf
```

### PDF Info
```bash
python scripts/main.py info document.pdf
```

## Examples

### Example 1: Extract Financial Tables
```bash
python scripts/main.py tables annual-report.pdf --output financials.csv

# Output: financials.csv with all tables found
# Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv
```

### Example 2: Batch Convert to Text
```bash
python scripts/main.py batch ./pdfs/ --output ./text/

# Converts all PDFs in folder to .txt files
```

### Example 3: Extract Specific Pages
```bash
python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

# Extracts only pages 1, 5-10, and 15
```

## Skill Boundaries

### What This Skill Does Well
- Structuring data analysis
- Identifying patterns and trends
- Creating visualization frameworks
- Calculating statistical measures

### What This Skill Cannot Do
- Access your actual data
- Replace statistical expertise
- Make business decisions
- Guarantee prediction accuracy

## Related Skills

- [web-scraper](../web-scraper/) - Scrape web content
- [content-repurposer](../content-repurposer/) - Repurpose extracted content

## Skill Metadata


- **Mode**: centaur
```yaml
category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week
```