--- name: text-summarizer description: Generate extractive summaries from long text documents. Control summary length, extract key sentences, and process multiple documents. --- # Text Summarizer Create concise summaries from long text documents using extractive summarization. Identifies and extracts the most important sentences while preserving meaning. ## Quick Start ```python from scripts.text_summarizer import TextSummarizer # Summarize text summarizer = TextSummarizer() summary = summarizer.summarize(long_text, ratio=0.2) # 20% of original print(summary) # Summarize file summary = summarizer.summarize_file("article.txt", num_sentences=5) ``` ## Features - **Extractive Summarization**: Selects key sentences from original text - **Length Control**: By ratio, sentence count, or word count - **Multiple Algorithms**: TextRank, LSA, frequency-based - **Key Points**: Extract bullet-point summaries - **Batch Processing**: Summarize multiple documents - **Preserve Structure**: Maintains sentence order option ## API Reference ### Initialization ```python summarizer = TextSummarizer( method="textrank", # textrank, lsa, frequency language="english" ) ``` ### Summarization ```python # By ratio (20% of original length) summary = summarizer.summarize(text, ratio=0.2) # By sentence count summary = summarizer.summarize(text, num_sentences=5) # By word count summary = summarizer.summarize(text, max_words=100) ``` ### Key Points Extraction ```python # Get bullet points points = summarizer.extract_key_points(text, num_points=5) for point in points: print(f"• {point}") ``` ### Batch Processing ```python # Summarize multiple texts texts = [text1, text2, text3] summaries = summarizer.summarize_batch(texts, ratio=0.2) # Summarize files in directory summaries = summarizer.summarize_directory("./articles/", ratio=0.3) ``` ### Options ```python # Preserve original sentence order summary = summarizer.summarize(text, preserve_order=True) # Include title/first sentence summary = summarizer.summarize(text, include_first=True) # Minimum sentence length filter summarizer.min_sentence_length = 10 ``` ## CLI Usage ```bash # Summarize text file python text_summarizer.py --input article.txt --ratio 0.2 # Specific sentence count python text_summarizer.py --input article.txt --sentences 5 # Extract key points python text_summarizer.py --input article.txt --points 5 # Batch process python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3 # Output to file python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2 ``` ### CLI Arguments | Argument | Description | Default | |----------|-------------|---------| | `--input` | Input file path | Required | | `--output` | Output file path | stdout | | `--input-dir` | Directory of files | - | | `--output-dir` | Output directory | - | | `--ratio` | Summary ratio (0.0-1.0) | 0.2 | | `--sentences` | Number of sentences | - | | `--words` | Maximum words | - | | `--points` | Extract N key points | - | | `--method` | Algorithm to use | textrank | | `--preserve-order` | Keep sentence order | False | ## Examples ### News Article Summary ```python summarizer = TextSummarizer() article = """ [Long news article text...] """ # Get a 3-sentence summary summary = summarizer.summarize(article, num_sentences=3) print("Summary:") print(summary) # Get key points points = summarizer.extract_key_points(article, num_points=5) print("\nKey Points:") for i, point in enumerate(points, 1): print(f"{i}. {point}") ``` ### Research Paper Abstract ```python summarizer = TextSummarizer(method="lsa") paper = open("research_paper.txt").read() # Create abstract-length summary abstract = summarizer.summarize(paper, max_words=250) print(abstract) ``` ### Meeting Notes Summary ```python summarizer = TextSummarizer() notes = """ Meeting started at 2pm. John presented Q3 results showing 15% growth. Sarah raised concerns about supply chain delays affecting Q4 projections. The team discussed mitigation strategies including dual-sourcing. Budget allocation for marketing was approved at $50k. Next steps include vendor outreach by Friday. Follow-up meeting scheduled for next Tuesday. """ summary = summarizer.summarize(notes, num_sentences=3) points = summarizer.extract_key_points(notes, num_points=4) print("Summary:", summary) print("\nAction Items:") for point in points: print(f"• {point}") ``` ### Batch Document Summarization ```python summarizer = TextSummarizer() import os for filename in os.listdir("./documents"): if filename.endswith(".txt"): text = open(f"./documents/{filename}").read() summary = summarizer.summarize(text, ratio=0.2) with open(f"./summaries/{filename}", "w") as f: f.write(summary) print(f"Summarized: {filename}") ``` ## Algorithm Comparison | Algorithm | Speed | Quality | Best For | |-----------|-------|---------|----------| | **TextRank** | Medium | High | General text | | **LSA** | Fast | Good | Technical docs | | **Frequency** | Fast | Medium | Quick summaries | ## Dependencies ``` nltk>=3.8.0 numpy>=1.24.0 scikit-learn>=1.2.0 ``` ## Limitations - Extractive only (doesn't paraphrase or generate new text) - Works best with well-structured text (paragraphs, clear sentences) - Very short texts may not summarize well - Doesn't understand context deeply (may miss nuance)