# LinkedIn Lead Scraper **Skill:** `/scrape-leads ` **Type:** Python CLI tool **API:** RapidAPI `fresh-linkedin-profile-data` --- ## What It Does Finds qualified leads from LinkedIn post interactions. Given a keyword, it searches for recent LinkedIn posts, extracts everyone who commented or reacted, then runs each person through a triple-gate filtering system to identify decision-makers worth reaching out to. The output is a CSV of qualified leads with full profile data, ready to import into a CRM or outbound tool. --- ## How It Works ``` Keyword (e.g. "revenue operations") │ ▼ ┌─────────────────────┐ │ Search LinkedIn │ │ Posts (last N days) │ └──────────┬──────────┘ │ ▼ ┌─────────────────────┐ │ Extract Commenters │ │ & Reactors │ └──────────┬──────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ TRIPLE-GATE FILTERING │ │ │ │ Gate 1: Negative Title Filter │ │ ├── Exclude: SDR, Intern, Student, Recruiter, │ │ │ Designer, Developer, Support, Freelance │ │ └── If title contains any → REJECT │ │ │ │ Gate 2: Positive Title Filter │ │ ├── Include: CEO, CRO, VP Sales, Director, │ │ │ Head of Growth, Founder, RevOps │ │ └── If title matches none → REJECT │ │ │ │ Gate 3: Firmographic Filter │ │ ├── Company size >= 10 employees │ │ ├── Exclude: staffing, recruiting, non-profit │ │ └── If fails either check → REJECT │ │ │ └──────────────────────┬──────────────────────────┘ │ ▼ Qualified Leads CSV ``` --- ## Usage ```bash # Basic: search for keyword, process last 7 days of posts python main.py "revenue operations" # Look back 30 days, process up to 100 posts python main.py "sales enablement" --days 30 --max-posts 100 # Skip reactions (faster, comments only) python main.py "outbound sales" --no-reactions # Process a specific post URL python main.py "keyword" --post-url "https://www.linkedin.com/feed/update/urn:li:activity:123456" # Verbose logging python main.py "GTM strategy" -v ``` --- ## Triple-Gate Filtering Logic ### Gate 1: Negative Title Filter (Exclusion) Catches common non-decision-maker roles before any further processing. If a person's title contains any of these terms, they are immediately rejected. | Category | Terms | |----------|-------| | Junior roles | intern, student, trainee, junior, assistant | | Outbound roles | sdr, bdr, sales development | | Support roles | recruiter, talent acquisition, hr specialist, support, customer service, helpdesk | | Technical roles | developer, engineer, software, backend, frontend | | Creative roles | designer, graphic, ux, ui | | Independent | freelance, consultant | ### Gate 2: Positive Title Filter (Inclusion) Only people with decision-maker titles pass through. The title must contain at least one of these terms. | Level | Terms | |-------|-------| | C-Suite | ceo, coo, cro, cmo, cto, cfo, chief executive, chief operating, chief revenue, chief marketing, chief technology | | VP Level | vp, vice president, vp sales, vp marketing, vp growth, vp revenue | | Director Level | director of sales, director of marketing, director of growth, director of revenue | | Head Level | head of sales, head of marketing, head of growth, head of revenue, head of demand gen | | Founders | founder, co-founder, owner, partner | | Revenue Ops | revops, revenue operations, sales operations, marketing operations, growth operations | ### Gate 3: Firmographic Filter Company-level checks to ensure the lead works at a viable target account. - **Minimum company size:** 10 employees - **Excluded industries:** staffing, recruiting, human resources, education, non-profit, government --- ## Output Format The CSV contains the following columns: | Column | Example | |--------|---------| | `name` | Jane Smith | | `title` | VP of Revenue Operations | | `company` | Acme Corp | | `linkedin_url` | https://linkedin.com/in/janesmith | | `employee_count` | 250 | | `industry` | SaaS | | `location` | San Francisco, CA | | `interaction_type` | comment | | `post_url` | https://linkedin.com/feed/update/... | | `qualified` | True | | `gate_passed` | all | --- ## Sample Run Output ``` 2026-02-20 14:32:01 - INFO - Searching for 'revenue operations' posts from last 7 days 2026-02-20 14:32:05 - INFO - Found 34 posts matching keyword 2026-02-20 14:32:45 - INFO - Extracted 412 unique profiles from comments and reactions 2026-02-20 14:33:12 - INFO - Filtering: 412 total → 47 qualified (G1 rejected: 189, G2 rejected: 152, G3 rejected: 24) 2026-02-20 14:33:12 - INFO - Saved to data/runs/revenue_operations_20260220_143312.csv ``` --- ## Configuration Set these in `.env` or `config.yaml`: ``` RAPIDAPI_KEY=your_key_here RAPIDAPI_HOST=fresh-linkedin-profile-data.p.rapidapi.com ``` --- ## Key Files | File | Purpose | |------|---------| | `main.py` | CLI entry point — keyword search, post processing, CSV export | | `filters.py` | Triple-gate filtering system with configurable title lists | | `linkedin_scraper/scraper.py` | Core scraper logic (API calls, deduplication) | | `linkedin_scraper/config.py` | Configuration loader (API keys, filter lists) |