--- version: "1.0.0" evaluation: programmatic agent: claude-code model: claude-sonnet-4-6 snapshot: python312-uv origin: url: "https://github.com/gooseworks-ai/gooseworks-skills/blob/main/skills/playbooks/signal-detection-pipeline/SKILL.md" source_host: "github.com" source_title: "Signal Detection Pipeline" imported_at: "2026-05-01T03:20:31Z" imported_by: "skill-to-runbook-converter@1.0.0" attribution: collection_or_org: "gooseworks-ai" skill_name: "signal-detection-pipeline" confidence: "high" secrets: {} --- # Signal Detection Pipeline — Agent Runbook ## Objective Detect buying signals from multiple sources, qualify leads, and generate outreach context. This runbook orchestrates multiple independent signal-detection sub-skills (job postings, funding events, conference attendance, Reddit discussions, and LinkedIn content) to surface companies that are actively in-market. Signals are combined, deduplicated, and scored to produce a prioritized list of qualified leads with outreach angles ready for downstream enrichment and campaign setup. ## REQUIRED OUTPUT FILES (MANDATORY) **You MUST write all of the following files to `/app/results`. The task is NOT complete until every file exists and is non-empty. No exceptions.** | File | Description | |------|-------------| | `/app/results/qualified_leads.csv` | Deduplicated, scored lead list: Company, Signal Sources, Signal Strength, Context, Outreach Angle | | `/app/results/job_posting_signals.json` | Raw output from the job-posting-intent skill | | `/app/results/funding_signals.json` | Raw output from the funding-signal-monitor skill | | `/app/results/conference_signals.json` | Raw output from the luma-event-attendees skill | | `/app/results/reddit_signals.json` | Raw output from the reddit-scraper skill | | `/app/results/linkedin_signals.json` | Raw output from linkedin-post-research + linkedin-commenter-extractor | | `/app/results/summary.md` | Executive summary: sources run, lead counts, top opportunities, issues | | `/app/results/validation_report.json` | Structured validation results with stages, results, and `overall_passed` | ## Parameters | Parameter | Value | Description | |-----------|-------|-------------| | Results directory | `/app/results` | Output directory for all results | | `target_keywords` | *(required)* | Keywords describing the problem/solution space (e.g. "sales automation", "data observability") | | `icp_criteria` | *(required)* | Ideal Customer Profile criteria (industry, company size, geography, tech stack) | | `signal_sources` | `all` | Comma-separated list of sources to run: `job_posting`, `funding`, `conference`, `reddit`, `linkedin`. Default: `all` | | `funding_stage_filter` | `seed,series-a,series-b` | Funding stages to include in funding signals | | `event_urls` | *(optional)* | Specific Luma event URLs for conference signals | | `subreddits` | *(optional)* | Specific subreddits to search for Reddit signals | | `linkedin_time_frame` | `30d` | Lookback window for LinkedIn post research | | `max_leads_per_source` | `50` | Maximum raw leads to retrieve per signal source | ## Dependencies | Dependency | Type | Required | Description | |------------|------|----------|-------------| | `requests` | Python package | Yes | HTTP calls to signal-source APIs | | `pyyaml` | Python package | Yes | Parse YAML skill frontmatter | | `pandas` | Python package | Yes | Deduplicate and score lead lists | | `job-posting-intent` skill | Sub-skill | Yes | Detect companies hiring in the problem area | | `funding-signal-monitor` skill | Sub-skill | Yes | Detect recently funded companies | | `luma-event-attendees` skill | Sub-skill | Conditional | Detect conference attendance signals | | `reddit-scraper` skill | Sub-skill | Conditional | Detect pain-signal posts on Reddit | | `linkedin-post-research` skill | Sub-skill | Conditional | Detect LinkedIn content signals | | `linkedin-commenter-extractor` skill | Sub-skill | Conditional | Extract commenters from LinkedIn posts | | `lead-qualification` skill | Sub-skill | Yes | Score and qualify leads from combined signals | | `contact-cache` skill | Sub-skill | Yes | Deduplicate against previously contacted companies | ## Step 1: Environment Setup ```bash pip install requests pyyaml pandas mkdir -p /app/results # Validate required inputs if [ -z "$TARGET_KEYWORDS" ]; then echo "ERROR: TARGET_KEYWORDS is not set"; exit 1 fi if [ -z "$ICP_CRITERIA" ]; then echo "ERROR: ICP_CRITERIA is not set"; exit 1 fi echo "Environment ready — Keywords: $TARGET_KEYWORDS | ICP: $ICP_CRITERIA" ``` ## Step 2: Run Signal Sources in Parallel Run the sources relevant to the client's ICP. Each is independent — run in parallel. Run the sources relevant to the client's ICP. Each is independent — invoke in parallel where possible. ### 2a: Job Posting Signals (Strongest) **Skill:** `job-posting-intent` Companies hiring for roles in the problem area = budget allocated and pain acknowledged. - Input: `target_keywords`, `icp_criteria` - Output: → `/app/results/job_posting_signals.json` ```python import subprocess, json, pathlib result = subprocess.run( ["python3", "-m", "skills.job_posting_intent", "--keywords", "$TARGET_KEYWORDS", "--icp", "$ICP_CRITERIA", "--max", "50", "--output", "/app/results/job_posting_signals.json"], capture_output=True, text=True ) print(result.stdout or result.stderr) ``` ### 2b: Funding Signals **Skill:** `funding-signal-monitor` Recently funded companies = budget available, growth mandate. - Input: `icp_criteria`, `funding_stage_filter` - Output: → `/app/results/funding_signals.json` ```bash python3 -m skills.funding_signal_monitor \ --icp "$ICP_CRITERIA" \ --stages "$FUNDING_STAGE_FILTER" \ --output /app/results/funding_signals.json ``` ### 2c: Conference Attendance Signals **Skill:** `luma-event-attendees` People attending events in the problem space = actively engaged. - Input: event URLs or topic search - Output: → `/app/results/conference_signals.json` ```bash python3 -m skills.luma_event_attendees \ --keywords "$TARGET_KEYWORDS" \ --event-urls "$EVENT_URLS" \ --output /app/results/conference_signals.json ``` ### 2d: Reddit Pain Signals **Skill:** `reddit-scraper` People complaining about or discussing the problem = experiencing the pain. - Input: `target_keywords`, optional `subreddits` - Output: → `/app/results/reddit_signals.json` ```bash python3 -m skills.reddit_scraper \ --keywords "$TARGET_KEYWORDS" \ --subreddits "$SUBREDDITS" \ --output /app/results/reddit_signals.json ``` ### 2e: LinkedIn Content Signals **Skills:** `linkedin-post-research` + `linkedin-commenter-extractor` People posting about or engaging with the problem = thought leaders or practitioners. - Input: `target_keywords`, `linkedin_time_frame` - Output: → `/app/results/linkedin_signals.json` ```bash python3 -m skills.linkedin_post_research \ --keywords "$TARGET_KEYWORDS" \ --time-frame "$LINKEDIN_TIME_FRAME" \ --extract-commenters true \ --output /app/results/linkedin_signals.json ``` ## Step 3: Combine and Deduplicate Signals After all sources complete, merge and deduplicate: ```python import pandas as pd, json, pathlib signal_files = { "job_posting": "/app/results/job_posting_signals.json", "funding": "/app/results/funding_signals.json", "conference": "/app/results/conference_signals.json", "reddit": "/app/results/reddit_signals.json", "linkedin": "/app/results/linkedin_signals.json", } all_leads = [] for source_name, path in signal_files.items(): p = pathlib.Path(path) if p.exists() and p.stat().st_size > 0: data = json.loads(p.read_text()) for lead in data.get("leads", []): lead["_source"] = source_name all_leads.append(lead) df = pd.DataFrame(all_leads) # Deduplicate by company name (normalize to lowercase) df["_company_key"] = df.get("company", df.get("name", pd.Series())).str.lower().str.strip() grouped = df.groupby("_company_key").agg( company=("company", "first"), signal_sources=("_source", lambda x: ", ".join(sorted(set(x)))), signal_count=("_source", "count"), context=("context", lambda x: " | ".join(x.dropna().astype(str)[:2])), ).reset_index(drop=True) # Score: job_posting+funding=3, linkedin+reddit=2, conference=1 WEIGHTS = {"job_posting": 3, "funding": 3, "linkedin": 2, "reddit": 2, "conference": 1} def score(row): return sum(WEIGHTS.get(s.strip(), 1) for s in row["signal_sources"].split(",")) grouped["signal_strength"] = grouped.apply(score, axis=1) grouped = grouped.sort_values("signal_strength", ascending=False) grouped["outreach_angle"] = "" # to be filled by lead-qualification skill grouped.to_csv("/app/results/qualified_leads.csv", index=False) print(f"Deduplicated leads: {len(grouped)}") ``` ## Step 4: Score and Qualify Leads Apply `lead-qualification` skill to the deduplicated list: ```bash python3 -m skills.lead_qualification \ --input /app/results/qualified_leads.csv \ --icp "$ICP_CRITERIA" \ --output /app/results/qualified_leads.csv ``` Signal scoring tiers: - **Highest intent**: Job posting + funding signal → priority outreach - **Validated pain**: LinkedIn post + Reddit complaint → value-led outreach - **Awareness only**: Single conference attendance → nurture sequence ## Step 5: Human Checkpoint — Review Before Proceeding **STOP HERE** — review the consolidated lead list before initiating outreach. ```bash echo "=== TOP LEADS ===" && head -20 /app/results/qualified_leads.csv ``` Verify: - [ ] Lead companies are plausible fits for the ICP - [ ] Signal context accurately reflects the source (no hallucinated entries) - [ ] Outreach angles are specific and non-generic - [ ] No contacts from the contact-cache (already reached) If review passes, continue to Step 6. If issues found, adjust `icp_criteria` or `target_keywords` and re-run from Step 2. ## Step 6: Iterate on Errors (max 3 rounds) If any signal source returned zero leads or an error: 1. Read the specific failure from its output JSON (`"error"` key) 2. Apply fix from the table below 3. Re-run only the failing source 4. Merge new results back via Step 3 Repeat up to **max 3 rounds** per source. | Issue | Fix | |-------|-----| | Job posting API rate-limited | Add `--delay 5` flag; retry after 5 minutes | | No Reddit posts found | Broaden `target_keywords`; add related subreddits | | LinkedIn returns 0 posts | Extend `linkedin_time_frame` to `90d` | | Funding source empty | Try adjacent funding stages (`pre-seed`, `series-c`) | | Conference events not found | Provide explicit `event_urls` parameter | After 3 rounds with persistent failure for a source, mark it as `skipped` in `summary.md` and continue with available signals. ## Step 7: Check Contact Cache Deduplicate final leads against previously contacted companies: ```bash python3 -m skills.contact_cache \ --check /app/results/qualified_leads.csv \ --remove-already-contacted true \ --output /app/results/qualified_leads.csv ``` ## Step 8: Write Executive Summary ```python import pandas as pd, pathlib df = pd.read_csv("/app/results/qualified_leads.csv") summary = f"""# Signal Detection Pipeline — Results ## Overview - **Date**: $(date -u +%Y-%m-%dT%H:%M:%SZ) - **Target keywords**: $TARGET_KEYWORDS - **ICP criteria**: $ICP_CRITERIA - **Total leads found**: {len(df)} - **Multi-signal leads**: {(df["signal_count"] > 1).sum()} (highest priority) ## Signal Source Results | Source | Leads Found | |--------|------------| | Job Posting | {len(pd.read_json("/app/results/job_posting_signals.json").get("leads", []))} | | Funding | {len(pd.read_json("/app/results/funding_signals.json").get("leads", []))} | | Conference | {len(pd.read_json("/app/results/conference_signals.json").get("leads", []))} | | Reddit | {len(pd.read_json("/app/results/reddit_signals.json").get("leads", []))} | | LinkedIn | {len(pd.read_json("/app/results/linkedin_signals.json").get("leads", []))} | ## Top 5 Leads by Signal Strength {df.head(5)[["company","signal_sources","signal_strength","outreach_angle"]].to_markdown(index=False)} ## Issues / Manual Follow-up - Review leads with `signal_count == 1` — lower confidence, may need manual validation - Outreach angles marked empty require human review before campaign launch """ pathlib.Path("/app/results/summary.md").write_text(summary) print("summary.md written") ``` ## Step 9: Final Checklist (MANDATORY — do not skip) ### Verification Script ```bash echo "=== FINAL OUTPUT VERIFICATION ===" RESULTS_DIR="/app/results" for f in \ "$RESULTS_DIR/qualified_leads.csv" \ "$RESULTS_DIR/job_posting_signals.json" \ "$RESULTS_DIR/funding_signals.json" \ "$RESULTS_DIR/conference_signals.json" \ "$RESULTS_DIR/reddit_signals.json" \ "$RESULTS_DIR/linkedin_signals.json" \ "$RESULTS_DIR/summary.md" \ "$RESULTS_DIR/validation_report.json"; do if [ ! -s "$f" ]; then echo "FAIL: $f is missing or empty" else echo "PASS: $f ($(wc -c < "$f") bytes)" fi done LEAD_COUNT=$(wc -l < "$RESULTS_DIR/qualified_leads.csv") if [ "$LEAD_COUNT" -gt 1 ]; then echo "PASS: qualified_leads.csv has $((LEAD_COUNT-1)) leads" else echo "FAIL: qualified_leads.csv has no leads — check signal sources" fi ``` ### Checklist - [ ] `qualified_leads.csv` exists and contains at least 1 lead row - [ ] All signal source JSON files exist (may be empty if source was skipped) - [ ] `summary.md` reports lead counts and top opportunities - [ ] `validation_report.json` exists with `overall_passed` field - [ ] Human checkpoint (Step 5) was completed before proceeding to outreach - [ ] Contact cache deduplication was applied (Step 7) - [ ] No individual source failed more than 3 times without being marked `skipped` **If ANY item fails, go back and fix it. Do NOT finish until all items pass.** ## Tips - **Run signal sources in parallel** when your execution environment supports it — each source is fully independent and the combined runtime drops from ~15 min to ~4 min. - **Job posting signals are the highest-quality signal**: a company hiring a role that requires solving your client's problem almost always has budget allocated and explicit pain acknowledged. - **Multi-signal leads are gold**: a company that appears in job postings AND received recent funding AND has a LinkedIn post discussing the problem has no weak spots in the signal chain — prioritize these above all others. - **Reddit signals require caution**: authors may be individual contributors, not decision-makers. Always enrich with `company-contact-finder` before outreach. - **Slugify company names consistently** before deduplication — "Acme Corp.", "ACME Corp", and "acme" are the same company. Use `str.lower().strip().replace(".", "")` at minimum. - **Contact cache integration prevents burnout**: always run Step 7 before exporting to any outreach campaign. Re-contacting a company that already declined is a hard negative signal.