# Breaking News Monitor [![收录于 JerryKing's Trove](https://img.shields.io/badge/收录于-JerryKing's%20Trove-blue)](https://github.com/realJerryKing/JerryKing-s-Trove) A lightweight, zero-cost breaking news detection skill for AI agents. Designed for high-frequency polling (every 5 minutes) with minimal token consumption. ## How It Works ``` Agent triggers skill (every ~5 min) ↓ Fetch 6 sources in parallel (~1.3s) ├── Google News EN (aggregates global English media) ├── Google News CN (aggregates Chinese media) ├── 中新网 (China News Service) ├── NPR (US public radio) ├── 财联社快讯 (CLS Telegraph, web scrape) └── 澎湃新闻 (The Paper, web scrape) ↓ 3-layer detection engine ├── Layer 1: Word-boundary regex matching (EN) + context patterns (CN) ├── Layer 2: Negative pattern filtering (removes metaphorical uses) └── Layer 3: Multi-source corroboration (2+ sources = higher confidence) ↓ Output ├── No breaking → "NO_BREAKING" (~0 tokens) └── Breaking → "BREAKING|S1|source,source|10:25 UTC|headline|url" ``` ## Quick Start ```bash # Run the monitor python3 scripts/check.py # Output when nothing detected NO_BREAKING # Output when breaking news found BREAKING|S1|中新网|08:39 UTC|台湾台东县海域发生5.1级地震 福建多地震感明显|https://www.chinanews.com.cn/... ``` ## Architecture ### Why This Approach? | Requirement | Solution | |---|---| | **Low token cost** | Pure script — zero LLM calls, zero API fees | | **Fast response** | Parallel HTTP fetch + lightweight XML parsing (~1.3s) | | **Source authority** | Google News (aggregates Reuters, AP, BBC, etc.), 中新网, NPR, 财联社 | | **No rate limiting** | RSS feeds are designed for polling; web scrape targets have generous limits | | **China accessible** | All 6 sources verified accessible from mainland China | ### 3-Layer Detection Engine **Layer 1 — Keyword Matching** English keywords use regex word boundaries (`\b`) to prevent substring false positives: ```python # ❌ v1: "Tech stocks crash amid..." matches "crash" (substring) if "crash" in title # ✅ v2: Only matches "crash" as a whole word if re.search(r'\bcrash\b', title.lower()) ``` Chinese keywords use context patterns — the keyword must appear in an event-describing context: ```python # ❌ "地震技术获突破" — "地震" in a technology context → not breaking # ✅ "台湾发生5.1级地震" — "地震" + "级地震" pattern → breaking ``` **Layer 2 — Negative Pattern Filtering** Metaphorical and non-event uses are filtered out: | False Positive | Pattern | Result | |---|---|---| | "Tech stocks crash amid..." | `crash` not preceded by event context | Filtered | | "Market crash course for beginners" | `crash\s+course` | Filtered | | "Coach resigns after losing season" | `resigns\s+after` | Filtered | | "Film festival bombing at the box office" | `bombing\s+(at\s+the\s+)?box\s+office` | Filtered | | "Invasion of privacy lawsuit" | `invasion\s+of\s+privacy` | Filtered | | "Earthquake-proof building technology" | `earthquake[-\s]proof` | Filtered | **Layer 3 — Multi-Source Corroboration** When the same event is reported by 2+ independent sources within the time window, confidence increases. Items from corroborated sources get a severity boost (S2 → S1). ### Severity Levels | Level | Meaning | Examples | |---|---|---| | **S1** | Highest priority — immediate attention | Earthquake 7+, terror attack, market crash, nuclear test | | **S2** | Significant event | Wildfire, hostage situation, cyberattack, epidemic | | **S3** | Notable but lower urgency | Political resignation, sanctions, trade war | ## Integration ### As an OpenCode Skill The skill auto-triggers when the agent detects keywords like "breaking news", "news check", "突发新闻", etc. The `SKILL.md` file contains the triggering description. ### As a Standalone Script ```bash # One-shot check python3 scripts/check.py # Cron job (every 5 minutes) */5 * * * * python3 /path/to/scripts/check.py >> /var/log/news-monitor.log 2>&1 # Shell loop while true; do python3 scripts/check.py; sleep 300; done ``` ### Agent Integration Pattern ```python import subprocess result = subprocess.run( ["python3", "scripts/check.py"], capture_output=True, text=True, timeout=30 ) for line in result.stdout.strip().split('\n'): if line.startswith('BREAKING|'): _, severity, sources, time, headline, url = line.split('|', 5) alert_user(f"🚨 [{severity}] {headline}") elif line.startswith('SOURCE_WARN|'): _, source, error = line.split('|', 2) log_warning(f"Source {source} is down: {error}") ``` ## Configuration ### Adding Sources Edit `FEEDS` or `WEB_SCRAPE_TARGETS` in `scripts/check.py`: ```python FEEDS = [ ("My Source", "https://example.com/rss.xml"), ] WEB_SCRAPE_TARGETS = [ ("My Scraper", "https://example.com/news", r'class="headline"[^>]*>([^<]+)'), ] ``` ### Adding Keywords Add to `EN_KEYWORDS` (severity 1-3) or `CN_KEYWORDS` (with context patterns): ```python EN_KEYWORDS["new_keyword"] = 1 # 1=highest, 3=lowest CN_KEYWORDS["新关键词"] = (1, ["发生新关键词", "新关键词致"]) ``` ### Tuning Sensitivity - `TIME_WINDOW_MINUTES` (default: 20) — how far back to look for news - `MAX_KNOWN_HASHES` (default: 300) — dedup memory size - Severity thresholds in the corroboration logic ## State Management Runtime state is stored in `state.json`: ```json { "known_hashes": ["a33bcfd63bf9", "..."], "last_check": "2026-05-12T08:57:25Z", "source_health": { "中新网": {"status": "ok", "last_ok": "...", "items": 25}, "NPR": {"status": "error", "error": "timeout", "last_ok": "..."} } } ``` - Atomic writes (temp file + rename) prevent corruption - Safe to delete to reset all state - Source health tracked for monitoring ## Limitations - **RSS latency**: 5-15 minute delay vs real-time wire services - **Keyword coverage**: Novel breaking events may not match any keyword - **Google News dependency**: Uses undocumented RSS endpoint (could change) - **No verification**: Cannot distinguish real news from rumors/headline errors - **Chinese NLP**: Context patterns are regex-based, not true NLP understanding ## License MIT