Hyper-Extract Logo

**Smart Knowledge Extraction CLI** **Transform documents into structured knowledge with one command.** [๐Ÿ“– English Version](./README.md) ยท [ไธญๆ–‡็‰ˆ](./README_ZH.md)

Trendshift

PyPI Version Python Version License Docs GitHub Stars


> **"Stop reading. Start understanding."** > *"ๅ‘Šๅˆซๆ–‡ๆกฃ็„ฆ่™‘๏ผŒ่ฎฉไฟกๆฏไธ€็›ฎไบ†็„ถ"*
Hero & Workflow
## ๐Ÿ“ฐ What's New - **๐Ÿ”Œ MCP Server** โ€” Query your knowledge abstracts from Claude Desktop and IDE agents with `he-mcp`. *(PR #40)* - **๐Ÿง  Anthropic Claude Support** โ€” Use `claude-opus-4-8`, `claude-sonnet-4-6`, and `claude-haiku-4-5` directly as your LLM provider. *(PR #38)* - **๐Ÿ“ Obsidian Export** โ€” Turn any graph into an Obsidian vault with Markdown notes linked by `[[wikilinks]]`. *(PR #37)* - **๐Ÿงน `he clean`** โ€” Remove a KA's index or the whole knowledge abstract in one command. *(PR #39)* - **๐Ÿ”ง Reliability Fixes** โ€” True mean for multi-chunk embeddings, capped OpenAI-compatible batch sizes, and resolved multi-word `llm_*` merge strategies. *(PRs #35, #36, #41)* See the full changelog in the [GitHub releases](https://github.com/yifanfeng97/hyper-extract/releases). Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed **Knowledge Abstracts**. It effortlessly extracts information into a wide spectrum of formatsโ€”ranging from simple **Collections** (Lists/Sets) and **Pydantic Models**, to complex **Knowledge Graphs**, **Hypergraphs**, and even **Spatio-Temporal Graphs**. ## โœจ Core Features | | | |:---|:---| | ๐Ÿ”ท **8 Knowledge Structures** | From simple Lists to advanced Graphs, Hypergraphs, and Spatio-Temporal Graphs | | ๐Ÿง  **10+ Extraction Engines** | GraphRAG, LightRAG, Hyper-RAG, KG-Gen, and more โ€” ready to use | | ๐Ÿ“ **80+ YAML Templates** | Zero-code extraction across Finance, Legal, Medical, TCM, Industry, and General domains | | ๐Ÿ”„ **Incremental Evolution** | Feed new documents anytime to expand and refine your knowledge base | | ๐Ÿ“ค **Obsidian Export** | Turn any extracted graph into an Obsidian vault โ€” Markdown notes linked by `[[wikilinks]]` | ## ๐ŸŽฏ What Can You Do With It?
๐Ÿ“„ Researcher โ€” Turn papers into knowledge graphs
Feed a 20-page academic paper, get an interactive graph of key concepts, authors, and citations. ```bash he parse paper.pdf -t general/academic_graph -o ./paper_kb/ he show ./paper_kb/ ```
๐Ÿฆ Financial Analyst โ€” Extract entities from earnings reports
Automatically identify companies, executives, financial metrics, and their relationships from unstructured reports. ```bash he parse earnings.md -t finance/earnings_graph -o ./finance_kb/ he search ./finance_kb/ "What are the key risk factors?" ```
๐Ÿ”’ Local Deployment โ€” Keep data on-premise with vLLM
Run Qwen3.5-9B + bge-m3 locally via vLLM. No data leaves your machine. ```python from hyperextract import create_client llm, emb = create_client( llm="vllm:Qwen3.5-9B@http://localhost:8000/v1", embedder="vllm:bge-m3@http://localhost:8001/v1", api_key="dummy", ) ```
## ๐Ÿš€ Supported Platforms & Models Hyper-Extract relies on the LLM's structured output capability (`json_schema` or Function Calling). | Platform | Verified Models | |----------|-----------------| | **OpenAI** | gpt-4o, gpt-4o-mini, gpt-5 | | **Anthropic** | claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5 | | **้˜ฟ้‡Œไบ‘็™พ็‚ผ** | qwen-plus, qwen-turbo, deepseek-r1 | | **Local vLLM** | Qwen3.5-9B (GPTQ-Marlin) | **Embedding models** (semantic search) work with any OpenAI-compatible endpoint: `text-embedding-3-small`, `text-embedding-v4` (Bailian), `bge-m3` (local vLLM). > **Anthropic note:** Claude is used for the **LLM** (set `ANTHROPIC_API_KEY`). Anthropic has no embeddings API, so pair it with an OpenAI-compatible embedder: > ```python > from hyperextract import create_client > llm, emb = create_client(llm="anthropic", embedder="openai:text-embedding-3-small") > ``` > Requires the extra: `pip install 'hyperextract[anthropic]'`. > ๐Ÿ“– Full guide: [Provider System & Local Model Support](https://yifanfeng97.github.io/Hyper-Extract/latest/concepts/provider-system/) ## โšก 30-Second Quick Start ```bash # Install uv tool install hyperextract # Configure API key he config init -k YOUR_OPENAI_API_KEY # Extract knowledge from a document he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en # Query it he search ./output/ "What are Tesla's major achievements?" # Visualize he show ./output/ # Export to an Obsidian vault (Markdown notes + [[wikilinks]]) he export obsidian ./output/ -o ./vault/ ```
๐Ÿ Python API (click to expand)
```bash uv pip install hyperextract ``` ```python from hyperextract import Template ka = Template.create("general/biography_graph") with open("examples/en/tesla.md") as f: result = ka.parse(f.read()) result.show() ``` > ๐Ÿ”— More examples: [examples/en](./examples/en/)
## ๐Ÿ“ˆ Why Hyper-Extract? | Feature | GraphRAG | LightRAG | KG-Gen | ATOM | **Hyper-Extract** | | :------ | :------: | :------: | :----: | :--: | :---------------: | | Knowledge Graph | โœ… | โœ… | โœ… | โœ… | โœ… | | Temporal Graph | โœ… | โŒ | โŒ | โœ… | โœ… | | Spatial Graph | โŒ | โŒ | โŒ | โŒ | โœ… | | Hypergraph | โŒ | โŒ | โŒ | โŒ | โœ… | | Domain Templates | โŒ | โŒ | โŒ | โŒ | โœ… | | Interactive CLI | โœ… | โŒ | โŒ | โŒ | โœ… | | Multi-language | โœ… | โŒ | โŒ | โŒ | โœ… | ## ๐Ÿงฉ Supported Knowledge Structures From simple to complex โ€” pick the right structure for your data: Knowledge Structures Matrix **Example โ€” AutoGraph visualization:** AutoGraph Visualization
๐Ÿ“‹ What's under the hood? (Architecture & Templates)
Hyper-Extract follows a **three-layer architecture**: - **Auto-Types** โ€” 8 strongly-typed data structures (Model, List, Set, Graph, Hypergraph, Temporal Graph, Spatial Graph, Spatio-Temporal Graph) - **Methods** โ€” Extraction algorithms: KG-Gen, GraphRAG, LightRAG, Hyper-RAG, Cog-RAG, and more - **Templates** โ€” 80+ presets across 6 domains. Zero-code setup. Architecture **Template example (Graph type):** ```yaml language: en name: Knowledge Graph type: graph tags: [general] description: 'Extract entities and their relationships.' output: entities: fields: - name: name type: str - name: type type: str - name: description type: str relations: fields: - name: source type: str - name: target type: str - name: type type: str identifiers: entity_id: name relation_id: '{source}|{type}|{target}' ``` - [Browse all 80+ templates](./hyperextract/templates/presets/) - [Create custom templates](./hyperextract/templates/DESIGN_GUIDE.md)
## ๐Ÿ“š Documentation & Resources | Resource | Link | | :------- | :--- | | Full Documentation | [yifanfeng97.github.io/Hyper-Extract](https://yifanfeng97.github.io/Hyper-Extract/latest/) | | CLI Guide | [Command-line interface](https://yifanfeng97.github.io/Hyper-Extract/latest/cli/) | | Provider System | [Model compatibility & local deployment](https://yifanfeng97.github.io/Hyper-Extract/latest/concepts/provider-system/) | | Template Gallery | [80+ presets](./hyperextract/templates/presets/) | | Examples | [Working code](./examples/) | ## ๐Ÿ”Œ MCP Server Expose your knowledge abstracts to MCP-capable assistants (Claude Desktop, IDE agents) via the [Model Context Protocol](https://modelcontextprotocol.io) โ€” read + export only. ```bash pip install 'hyperextract[mcp]' he-mcp # stdio MCP server ``` Tools: `list_templates`, `info`, `search`, `ask` (RAG), `export_obsidian`. Full guide: [MCP Server docs](https://yifanfeng97.github.io/Hyper-Extract/latest/mcp/). ## ๐Ÿค Contributing & License Contributions are welcome! Please submit [Issues](https://github.com/yifanfeng97/hyper-extract/issues) and [PRs](https://github.com/yifanfeng97/hyper-extract/pulls). Licensed under **Apache-2.0**. ## ๐Ÿ”’ Security This project has been security assessed by [MseeP.ai](https://mseep.ai/app/yifanfeng97-hyper-extract). ## โญ Star History [![Star History Chart](https://api.star-history.com/svg?repos=yifanfeng97/hyper-Extract&type=Date)](https://star-history.com/#yifanfeng97/hyper-Extract&Date)