--- name: ebook-analysis description: Parse ebooks, extract concepts and entities with citation traceability, classify by type/layer, and synthesize across book collections. license: MIT metadata: author: jwynia version: "2.1" domain: research cluster: media-meta-analysis type: orchestrator mode: diagnostic+generative maturity: working maturity_score: 14 --- # Ebook Analysis: Non-Fiction Knowledge Extraction You analyze ebooks to extract knowledge with full citation traceability. This skill supports two complementary extraction modes: 1. **Concept Extraction** - Extract ideas classified by abstraction (principle → tactic) 2. **Entity Extraction** - Extract named things (studies, researchers, frameworks, anecdotes) that persist across books ## Core Principle **Every extraction must be traceable to its exact source.** Citation traceability is non-negotiable. Extract less with full provenance rather than more without it. --- ## Two Extraction Modes ### Mode 1: Concept Extraction For extracting IDEAS organized by abstraction level. **Use when:** Analyzing a book for transferable ideas, building a concept taxonomy, understanding how abstract principles relate to concrete tactics. **Output:** JSON files (analysis.json, concepts.json) **Example:** "Spaced repetition improves retention" is a MECHANISM at Layer 2. ### Mode 2: Entity Extraction For extracting NAMED THINGS that can be cross-referenced across books. **Use when:** Building a knowledge base where the same study, researcher, or framework appears in multiple books. The goal is entity resolution—recognizing that "Hogarth's framework" in Range is the same as "kind/wicked environments" mentioned elsewhere. **Output:** Markdown files in knowledge base structure **Example:** "Kind vs Wicked Environments" is a FRAMEWORK by Robin Hogarth. ### Choosing a Mode | If you want to... | Use Mode | |-------------------|----------| | Understand a book's argument structure | Concept Extraction | | Build a reference library across books | Entity Extraction | | Create actionable takeaways | Concept Extraction | | Track what researchers say across sources | Entity Extraction | | Both | Run both modes sequentially | --- ## Entity Extraction Mode (Detailed) ### Entity Types | Type | What It Captures | Example | |------|------------------|---------| | **study** | Research findings, experiments, data | Flynn Effect, Marshmallow Test | | **researcher** | People and their contributions | Anders Ericsson, Robin Hogarth | | **framework** | Mental models, taxonomies, systems | Kind vs Wicked, Desirable Difficulties | | **anecdote** | Stories used to illustrate points | Tiger vs Roger, Challenger Disaster | | **concept** | Ideas that aren't frameworks | Cognitive entrenchment, Match quality | ### Extended Entity Type Guidance Some entities don't fit cleanly into the five types. Guidelines: | Entity Kind | Use Type | Rationale | |-------------|----------|-----------| | **Simulations/Games** (Superstruct, EVOKE) | anecdote | Illustrative events, even if hypothetical | | **Institutions** (IFTF, WEF) | researcher | Organizations contribute ideas like individuals | | **Historical events** (Challenger disaster) | anecdote | Stories that illustrate principles | | **Hypothetical scenarios** | anecdote | Future scenarios from books like Imaginable | | **Thought experiments** | framework | If systematic; otherwise concept | **When uncertain:** Default to `anecdote` for narratives/events, `concept` for ideas, `framework` for systematic methods. ### Author-as-Subject Pattern When the book's author is also a significant entity (e.g., Jane McGonigal in Imaginable): **Create a researcher entity if:** - Author has notable prior work or institutional affiliation - Author appears in Wikipedia or other reference sources - Author's background/credentials are relevant to understanding the book - Other books in your collection might reference them **Skip if:** - Author is primarily known only for this book - No external sources to verify/enrich the entity **Template addition for author-subjects:** ```markdown ## Note This researcher is the author of [Book] in our collection. Their frameworks and concepts are documented separately. ``` ### Entity File Template ```markdown # [Entity Name] **Type:** study | researcher | framework | anecdote | concept **Status:** stub | partial | solid | authoritative **Last Updated:** YYYY-MM-DD **Aliases:** alias1, alias2, alias3 ## Summary [2-3 sentence synthesized understanding] ## Key Findings / What It Illustrates 1. [Claim or finding with source] — Source: [Book], Ch.[X] 2. [Another claim] — Source: [Book], Ch.[X] ## Key Quotes > "Quotable text here." > "Another memorable quote." ## Sources in Collection | Book | Author | How It's Used | Citation | |------|--------|---------------|----------| | Range | Epstein | [Role in book] | Ch.X | ## Sources NOT in Collection - [Book that would enrich this entity] ## Related Entities - [Other Entity](../type/other-entity.md) - Relationship description ## Open Questions - [What we don't yet know] ``` ### Knowledge Base Structure ``` /knowledge/ ├── _index.md # Master registry ├── _entities.json # Searchable index (generated) │ ├── nonfiction/ │ ├── _index.md # Domain index │ ├── _[book]-quotes.md # Book-specific quotes file │ ├── studies/ │ │ ├── flynn-effect.md │ │ └── chase-simon-chunking.md │ ├── researchers/ │ │ ├── hogarth-robin.md │ │ └── tetlock-philip.md │ ├── frameworks/ │ │ ├── kind-vs-wicked-environments.md │ │ └── desirable-difficulties.md │ ├── anecdotes/ │ │ ├── tiger-vs-roger.md │ │ └── challenger-disaster.md │ └── concepts/ │ ├── cognitive-entrenchment.md │ └── match-quality.md │ ├── cooking/ # Domain-specific structure │ ├── techniques/ │ ├── ingredients/ │ └── equipment/ │ └── technical/ ├── patterns/ └── technologies/ ``` ### Quotes Extraction Quotable quotes are a distinct extraction type. For each book, create a quotes file: **File:** `_[book-slug]-quotes.md` **Structure:** ```markdown # Quotable Quotes from [Book Title] **Author:** [Author] **Last Updated:** YYYY-MM-DD ## On [Theme 1] > "Quote text here." > "Another quote on same theme." ## On [Theme 2] > "Quote on different theme." ``` **What makes a good quote:** - Memorable phrasing that captures a key insight - Self-contained (understandable without context) - Surprising or counterintuitive formulation - Useful for presentations, writing, or reference ### Entity Extraction Workflow 1. **Scan book** - Read through identifying named studies, researchers, frameworks, illustrative stories 2. **Check existing entities** - Use `kb-resolve-entity.ts` to see if entity already exists 3. **Create or update** - New entity → create file; existing → add as source 4. **Add quotes** - Extract memorable quotes to quotes file 5. **Cross-link** - Add Related Entities sections 6. **Regenerate index** - Run `kb-generate-index.ts` ### Entity Extraction States (KB0-KB5) | State | Symptoms | Intervention | |-------|----------|--------------| | **KB0** | No knowledge base | Create directory structure | | **KB1** | Structure exists, no entities | Begin extraction | | **KB2** | Extracting from book | Create entity files | | **KB3** | Entities created, not linked | Add Related Entities | | **KB4** | Linked, no index | Run kb-generate-index.ts | | **KB5** | Complete for this book | Proceed to next book | ### Cross-Book Synthesis Workflow **Triggered when:** 2+ books have been extracted to the knowledge base. **Goals:** 1. Find entities that appear in multiple books 2. Identify conceptual connections between books 3. Surface contradictions or complementary perspectives 4. Update entity files with multi-source synthesis **Process:** 1. **Entity overlap detection** ```bash # Find entities with 2+ sources grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \ xargs grep -l "| .* | .* |" | head -20 ``` Or manually review entities updated with new source. 2. **Conceptual connection mapping** - Compare frameworks across books (e.g., Range's "wicked environments" ↔ Imaginable's "futures thinking") - Identify shared researchers (e.g., Tetlock appears in both Range and Imaginable) - Look for complementary themes (prediction failure → preparation despite uncertainty) 3. **Synthesis documentation** For entities appearing in 2+ books, update the Summary section: ```markdown ## Summary [Synthesized understanding from BOTH sources, noting agreements and differences] ``` 4. **Cross-book insights** Document thematic connections in `context/insights/cross-book-{theme}.md`: ```markdown # Cross-Book Insight: [Theme] ## Books Contributing - Range (Epstein) - [perspective] - Imaginable (McGonigal) - [perspective] ## Synthesis [How the books complement or contradict each other] ``` --- ## Concept Extraction Mode (Detailed) ### Concept Types (Abstract → Concrete) | Type | Definition | Example | |------|------------|---------| | **Principle** | Foundational truth or axiom | "Communities form around shared identity" | | **Mechanism** | How something works | "Reciprocity creates social bonds" | | **Pattern** | Recurring structure or approach | "The community lifecycle pattern" | | **Strategy** | High-level approach to achieve goals | "Build trust before asking for contribution" | | **Tactic** | Specific actionable technique | "Send welcome emails within 24 hours" | ### Abstraction Layers | Layer | Name | Abstraction | Example | |-------|------|-------------|---------| | 0 | Foundational | Universal principles | "Humans seek belonging" | | 1 | Theoretical | Domain-specific theory | "Community requires shared purpose" | | 2 | Strategic | Approaches and frameworks | "The funnel model of engagement" | | 3 | Tactical | Specific methods | "Onboarding sequences" | | 4 | Specific | Concrete implementations | "Use Discourse for forums" | ### Relationship Types | Relationship | Meaning | When to Use | |--------------|---------|-------------| | **INFLUENCES** | A affects B | Causal or correlational connection | | **SUPPORTS** | A provides evidence for B | Citation, example, validation | | **CONTRADICTS** | A conflicts with B | Opposing claims | | **COMPOSED_OF** | A contains B | Part-whole relationships | | **DERIVES_FROM** | A is derived from B | Logical conclusions | ### Concept Extraction States (EA0-EA7) | State | Symptoms | Intervention | |-------|----------|--------------| | **EA0** | No input file | Guide file preparation | | **EA1** | Raw file, not parsed | Run ea-parse.ts | | **EA2** | Parsed, not extracted | LLM extracts concepts | | **EA3** | Extracted, not classified | Assign types and layers | | **EA4** | Classified, not annotated | Add themes, relationships | | **EA5** | Single book complete | Export or proceed to synthesis | | **EA6** | Multi-book ready | Cross-book synthesis | | **EA7** | Analysis complete | Generate reports | ### Concept Extraction Workflow 1. **Parse** - Run `ea-parse.ts` to chunk book with position tracking 2. **Extract** - Present chunks to LLM for concept identification with exact quotes 3. **Classify** - Assign type (principle→tactic) and layer (0-4) 4. **Annotate** - Add themes and functional analysis 5. **Link** - Connect related concepts 6. **Export** - Generate analysis.json, concepts.json, report.md --- ## Available Tools ### Parsing Tools #### ea-parse.ts Parse ebook files into chunks with metadata and position tracking. ```bash deno run --allow-read scripts/ea-parse.ts path/to/book.txt deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150 ``` **Output:** JSON with metadata, chapters (if detected), and chunks with positions. ### Knowledge Base Tools #### kb-generate-index.ts Scan knowledge base and generate searchable entity index. ```bash deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge ``` **Output:** Creates `_entities.json` with all entities, aliases, and metadata. #### kb-resolve-entity.ts Search for existing entities before creating duplicates. ```bash deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect" deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5 deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --json ``` **Options:** - `--threshold <0-1>` - Minimum match score (default: 0.3) - `--limit ` - Maximum results (default: 5) - `--json` - Output as JSON ### Validation Tools #### ea-validate.ts Validate analysis output for citation accuracy and schema completeness. ```bash deno run --allow-read scripts/ea-validate.ts analysis.json --report ``` --- ## Anti-Patterns ### The Extraction Flood **Pattern:** Extracting every potentially interesting phrase. **Fix:** Ask "Would I cite this?" before extracting. Quality over quantity. ### The Citation Black Hole **Pattern:** Extracting without preserving exact quotes or positions. **Fix:** Always capture: exact quote, chapter reference, context. ### The Duplicate Entity **Pattern:** Creating new entity without checking if it exists. **Fix:** Always run `kb-resolve-entity.ts` first. ### The Orphan Entity **Pattern:** Entities without Related Entities links. **Fix:** Every entity should connect to at least 2 others. ### The Quote-Free Entity **Pattern:** Entity captures ideas but no memorable phrasing. **Fix:** Include Key Quotes section with author's exact words. ### The Single-Book Silo **Pattern:** Analyzing books without cross-referencing. **Fix:** After 2+ books, run synthesis to find connections. --- ## Example Workflows ### Full Entity Extraction (Range Example) ``` 1. Scan book chapter by chapter 2. Identify all named studies, researchers, frameworks, anecdotes 3. Create inventory document listing all potential entities 4. For each entity: a. kb-resolve-entity.ts "[entity name]" to check existence b. Create markdown file in appropriate type directory c. Fill in template with findings and citations d. Add Key Quotes section 5. Create _range-quotes.md with all memorable quotes 6. Update _index.md with new entities 7. kb-generate-index.ts to rebuild _entities.json ``` ### Quick Concept Scan ``` 1. ea-parse.ts book.txt --chunk-size 2000 2. For each chunk, extract top 3-5 concepts 3. Classify by type and layer 4. Generate concepts.json and report.md ``` --- ## Output Persistence ### Entity Extraction Output | File | Location | |------|----------| | Entity files | `knowledge/{domain}/{type}/{entity-slug}.md` | | Quotes file | `knowledge/{domain}/_[book]-quotes.md` | | Entity index | `knowledge/_entities.json` | | Domain index | `knowledge/{domain}/_index.md` | ### Concept Extraction Output | File | Location | |------|----------| | Full analysis | `ebook-analysis/{author}-{title}/analysis.json` | | Concepts only | `ebook-analysis/{author}-{title}/concepts.json` | | Citations | `ebook-analysis/{author}-{title}/citations.json` | | Report | `ebook-analysis/{author}-{title}/report.md` | --- ## Verification (Oracle) ### What This Skill Can Verify - **Citation positions exist** - Validate quoted text appears at claimed position - **Schema completeness** - Required fields present - **Cross-reference integrity** - Referenced entities exist - **Duplicate detection** - Entity doesn't already exist (via kb-resolve-entity.ts) ### What Requires Human Judgment - **Significance** - Is this worth extracting? - **Classification** - Is this really a "framework" vs "concept"? - **Relationship validity** - Does A really influence B? - **Quote quality** - Is this actually memorable? --- ## Integration Graph ### Inbound (From Other Skills) | Source | Leads to | |--------|----------| | research | Multi-book synthesis ready | | reverse-outliner | Structural data for concept extraction | ### Outbound (To Other Skills) | From State | Leads to | |------------|----------| | Entity extraction complete | dna-extraction (deep functional analysis) | | Concept extraction complete | media-meta-analysis (cross-source synthesis) | ### Complementary Skills | Skill | Relationship | |-------|--------------| | dna-extraction | 6-axis functional analysis for annotation | | reverse-outliner | Structural approach for fiction | | voice-analysis | Author style fingerprinting | | context-network | Knowledge base maintenance | --- ## Calibration Data (from Range + Imaginable extractions) ### By Book Density | Book Type | Expected Entities | Estimated Effort | |-----------|-------------------|------------------| | Dense non-fiction (Range, Thinking Fast & Slow) | 60-100 | 4-6 hours | | Moderate non-fiction (most business books) | 30-50 | 2-3 hours | | Light non-fiction (popular science) | 15-30 | 1-2 hours | | Technical books | 20-40 | 2-3 hours | ### By Book Subtype Different non-fiction subtypes yield different entity profiles: | Subtype | Example | Entity Profile | Expected Count | |---------|---------|----------------|----------------| | **Research synthesis** | Range | Many studies, researchers, frameworks | 60-100 | | **Methodological/How-to** | Imaginable | Many frameworks, few studies | 30-50 | | **Memoir/Narrative** | Educated | Few frameworks, many anecdotes | 20-40 | | **Reference** | Technical manuals | Many concepts, few anecdotes | Variable | **Research synthesis books** cite many studies and researchers, connecting ideas across domains. **Methodological books** teach techniques and frameworks but cite fewer external sources. **Memoir/narrative** books use personal stories to illustrate points rather than research. ### Metadata Reliability Warning Book classification metadata (Calibre tags, library categories) is often: - **Wrong** - Fiction/non-fiction misclassified - **Generic** - "General Fiction" or "Self-Help" applied broadly - **Inconsistent** - Same book categorized differently across sources Always verify classification makes sense before extraction. A "fiction" tag on a methodology book like Imaginable is a metadata error. --- ## Reasoning Requirements ### Standard Reasoning - Single chunk concept extraction - Type/layer classification - Simple relationship identification - Individual entity creation ### Extended Reasoning (ultrathink) Use extended thinking for: - **Multi-book synthesis** - requires holding multiple networks simultaneously - **Contradiction detection** - semantic comparison across sources - **Theme emergence** - identifying patterns across large sets - **Knowledge gap identification** - reasoning about what's missing **Trigger phrases:** "synthesize across books", "find contradictions", "identify gaps", "comprehensive analysis" --- ## What You Do NOT Do - Extract without citation traceability - Create entities without checking for duplicates - Skip the linking phase (orphan entities are not useful) - Leave entities without quotes - Treat fiction as non-fiction - Use regex for semantic analysis (LLM judgment only)