--- name: vc-finder description: 'Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit.' compatibility: [claude-code, gemini-cli, github-copilot] --- # VC Finder Take a product URL or description. Detect industry and stage. Find 5 comparable funded companies. Run two research tracks: who invested in those comparables (Track A), and which VCs publish theses about this space (Track B). Return a sourced, ranked investor list with outreach hooks. --- **Zero-hallucination policy:** Every fact in the output must be traceable to a specific Tavily search result or the fetched product page. This applies to: - Comparable company names: must appear in Tavily search results, not AI training knowledge - VC fund names: must appear verbatim in Tavily search results - Check sizes, stage focus, portfolio companies: must come from search snippets, not AI knowledge - Fund overviews and thesis summaries: extracted from search snippets only. If a detail is not in the search data, write "not found in search data" -- do not fill from training knowledge. --- ## Common Mistakes | The agent will want to... | Why that's wrong | |---|---| | Add a16z or Sequoia because they are famous | A famous VC without evidence is noise. Only include VCs that appear in Tavily search results for this specific product. Name-dropping wastes the founder's time. | | Generate comparable companies from training knowledge | Comparables must come from Tavily search results (Step 6). AI knowledge of companies is not evidence -- a company suggested from memory may have wrong funding status or may not be a true comparable. | | Continue when all 5 Track A searches return 0 results | Zero Track A results means the comparables were wrong or too obscure. Stop, re-run Step 6 with broader search queries, and retry. | | Include a Track B VC without citing the article or post | Thesis without a source is indistinguishable from hallucination. The founder cannot verify it and the list loses all credibility. | | Fill in fund overview from training knowledge | Fund overviews must come from Tavily snippet text only. If the snippets don't describe the fund, write "not found in search data". | | Detect stage from website aesthetics | Stage must come from the specific CTA signals detected in Step 4. | | Write generic outreach hooks | Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal or thesis quote from the search data. | | Skip the URL fetch when the user also provides a description | Always fetch the URL. The live page often reveals stage signals that the user's description omits. | --- ## Step 1: Setup Check ```bash echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}" echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-not set, Tavily extract will be used as fallback}" ``` **If TAVILY_API_KEY is missing:** Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com -- free tier: 1000 credits/month (about 125 full runs). Add it to your .env file." **If only FIRECRAWL_API_KEY is missing:** Continue silently. Tavily extract will be used for the URL fetch. --- ## Step 2: Gather Input You need: - Product URL (required, unless user pastes a product description directly) - Optional: target stage hint (pre-seed, seed, series-a, series-b) -- if provided, use it and skip stage detection - Optional: geography preference (US, Europe, global) -- defaults to US if not specified **If the user provides only a pasted description (no URL):** Skip Steps 3-4. Go directly to Step 5 with the pasted text as `product_content`. Set `stage_source` to `user_description`. **If neither URL nor description is provided:** Ask: "What is the URL of your product or startup? Or paste a short description: what it does, who it is for, and what stage you are at (pre-seed, seed, Series A)." Derive product slug from URL for the output filename: ```bash PRODUCT_SLUG=$(python3 -c " from urllib.parse import urlparse url = 'URL_HERE' host = urlparse(url).netloc.replace('www.', '') print(host.split('.')[0]) ") ``` --- ## Step 3: Fetch Product Page **Primary: Firecrawl (if FIRECRAWL_API_KEY is set)** ```bash curl -s -X POST https://api.firecrawl.dev/v1/scrape \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "URL_HERE", "formats": ["markdown"], "onlyMainContent": true}' \ | python3 -c " import sys, json d = json.load(sys.stdin) content = d.get('data', {}).get('markdown', '') or d.get('markdown', '') print(f'Fetched: {len(content)} characters') open('/tmp/vc-product-raw.md', 'w').write(content) " ``` **Fallback: Tavily extract (if FIRECRAWL_API_KEY is not set)** ```bash curl -s -X POST https://api.tavily.com/extract \ -H "Content-Type: application/json" \ -d "{\"api_key\": \"$TAVILY_API_KEY\", \"urls\": [\"URL_HERE\"]}" \ | python3 -c " import sys, json d = json.load(sys.stdin) content = d.get('results', [{}])[0].get('raw_content', '') print(f'Fetched via Tavily extract: {len(content)} characters') open('/tmp/vc-product-raw.md', 'w').write(content) " ``` **Step-level checkpoint:** ```bash python3 -c " content = open('/tmp/vc-product-raw.md').read() if len(content) < 200: print('ERROR: Page returned fewer than 200 characters.') else: print(f'Content OK: {len(content)} characters') " ``` **If content < 200 characters:** Stop fetching. Tell the user: "The product page returned no readable content. This usually means the site is JavaScript-rendered and requires a browser. Please paste your product description directly: what it does, who it is for, and what stage you are at." Proceed to Step 5 using the pasted description as `product_content`. --- ## Step 4: Detect Stage Signals Locally (No API) Parse the fetched markdown with regex before the analysis step. ```bash python3 << 'PYEOF' import re, json content = open('/tmp/vc-product-raw.md').read().lower() stage_signals = [] if re.search(r'join\s+(the\s+)?waitlist|sign\s+up\s+for\s+beta|early\s+access|request\s+(an?\s+)?invite|get\s+notified', content): stage_signals.append({'signal': 'waitlist or beta CTA', 'stage_hint': 'pre-seed'}) if re.search(r'start\s+(your\s+)?free\s+trial|try\s+(it\s+)?for\s+free|request\s+a?\s+demo|book\s+a?\s+demo|schedule\s+a?\s+demo', content): stage_signals.append({'signal': 'free trial or demo CTA', 'stage_hint': 'seed'}) if re.search(r'contact\s+sales|talk\s+to\s+(our\s+)?sales|see\s+pricing|view\s+pricing|plans\s+and\s+pricing', content): stage_signals.append({'signal': 'pricing or sales CTA', 'stage_hint': 'series-a'}) if re.search(r'case\s+stud(y|ies)|customer\s+stor(y|ies)|trusted\s+by\s+[\d,]+|used\s+by\s+[\d,]+', content): stage_signals.append({'signal': 'case studies or customer count', 'stage_hint': 'series-a'}) if re.search(r'enterprise\s+(plan|pricing|tier)|we.?re\s+hiring|join\s+our\s+team|open\s+positions', content): stage_signals.append({'signal': 'enterprise tier or job openings', 'stage_hint': 'series-a-or-b'}) funding_match = re.search( r'raised\s+\$[\d,.]+\s*[mk]?|series\s+[abc]\s+round|seed\s+round|(\$[\d,.]+\s*[mk]?\s+(?:seed|series\s+[abc]))', content ) if funding_match: stage_signals.append({'signal': f'funding text: {funding_match.group(0).strip()}', 'stage_hint': 'announced'}) if not stage_signals: dominant = 'unknown' elif any(s['stage_hint'] == 'announced' for s in stage_signals): dominant = 'announced' elif any(s['stage_hint'] == 'series-a-or-b' for s in stage_signals): dominant = 'series-a' elif any(s['stage_hint'] == 'series-a' for s in stage_signals): dominant = 'series-a' elif any(s['stage_hint'] == 'seed' for s in stage_signals): dominant = 'seed' else: dominant = 'pre-seed' confidence = 'high' if len(stage_signals) >= 2 else ('medium' if len(stage_signals) == 1 else 'low') result = {'signals': stage_signals, 'dominant_stage': dominant, 'confidence': confidence} json.dump(result, open('/tmp/vc-stage-signals.json', 'w'), indent=2) print(f'Stage: {dominant} ({confidence} confidence) from {len(stage_signals)} signal(s)') for s in stage_signals: print(f' - {s["signal"]} -> {s["stage_hint"]}') PYEOF ``` --- ## Step 5: Product Analysis (Taxonomy, Stage, ICP) Print the product content and stage signals: ```bash python3 -c " import json content = open('/tmp/vc-product-raw.md').read()[:6000] signals = json.load(open('/tmp/vc-stage-signals.json')) print('=== PRODUCT PAGE (first 6000 chars) ===') print(content) print() print('=== DETECTED STAGE SIGNALS ===') print(json.dumps(signals, indent=2)) " ``` **AI instructions:** Analyze the product page content above. Generate the taxonomy, ICP, and stage classification only -- do NOT generate comparable companies yet (that is done via live search in Step 6). Write to `/tmp/vc-product-analysis.json`: - `product_name`: from the page - `one_line_description`: what it does, for whom, core value prop. Under 20 words. No marketing language. - `industry_taxonomy`: `l1` (top-level: fintech / healthtech / developer tools / consumer / etc.), `l2` (sector: sales technology / logistics software / etc.), `l3` (specific niche: outbound prospecting / last-mile routing / etc.). Vague labels like "technology" or "software" alone are not acceptable. - `icp`: `buyer_persona` (job title), `company_type`, `company_size` - `detected_stage`: pre-seed / seed / series-a / series-b / unknown - `stage_confidence`: high / medium / low - `stage_evidence`: one sentence citing exactly which CTA or text on the page drove this. Write "no clear signals found" if unknown. - `geography_bias`: US / Europe / global / unclear - `comparable_companies`: leave as empty array `[]` -- will be filled in Step 6 ```bash python3 << 'PYEOF' import json analysis = { # FILL from your analysis above "comparable_companies": [] } json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2) print('Product analysis written.') PYEOF ``` Verify: ```bash python3 -c " import json a = json.load(open('/tmp/vc-product-analysis.json')) print('Product:', a['product_name']) print('Industry:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3']) print('Stage:', a['detected_stage'], '(' + a['stage_confidence'] + ' confidence)') " ``` --- ## Step 5b: Curated Pre-Match Against Verified Fund Dataset Run the product taxonomy against a curated dataset of 25 verified VC funds (sourced from fund websites). Produces zero-hallucination fund matches and seed comparables for Track A -- no Tavily credits consumed. Print product analysis for tag mapping: ```bash python3 -c " import json a = json.load(open('/tmp/vc-product-analysis.json')) print('Taxonomy:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3']) print('Stage:', a['detected_stage']) print('Geography:', a['geography_bias']) " ``` **AI instructions:** Map the product taxonomy to the standard tags used in the fund dataset. Available tags: `DevTools`, `Infrastructure`, `Open Source`, `B2B SaaS`, `AI`, `Data`, `FinTech`, `HealthTech`, `Enterprise`, `Consumer`, `Marketplaces`, `E-commerce`, `Crypto`, `DeepTech`, `Cybersecurity`, `Generalist` Pick 2-4 tags that describe this product. Map `detected_stage` to: `Pre-seed`, `Seed`, `Series A`, or `Growth`. Map `geography_bias` to: `US`, `Europe`, `India`, or `Global`. Write product context: ```bash python3 << 'PYEOF' import json # FILL based on taxonomy analysis above context = { "extracted_tags": ["TagA", "TagB"], # 2-4 tags from the list above "stage_hint": "Seed", # Pre-seed / Seed / Series A / Growth "geography_hint": "US" # US / Europe / India / Global } json.dump(context, open('/tmp/vc-product-context.json', 'w'), indent=2) print('Product context:', context) PYEOF ``` Run scoring against the embedded curated dataset: ```bash python3 << 'PYEOF' import json context = json.load(open('/tmp/vc-product-context.json')) VC_FUNDS = [ {"fund_name":"Y Combinator","thesis":"We provide seed funding for startups. We invest in deeply technical teams building massive companies across all domains.","check_size":"$500k","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","DevTools","AI"],"geography_focus":["Global"],"notable_portfolio":["Stripe","Airbnb","GitLab"],"website":"https://www.ycombinator.com"}, {"fund_name":"boldstart ventures","thesis":"Day one partner for developer first, crypto, and SaaS founders. We love deeply technical founders solving hard infrastructure problems.","check_size":"$1M - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["DevTools","Infrastructure","Crypto"],"geography_focus":["Global","US"],"notable_portfolio":["Snyk","Blockdaemon","Superhuman"],"website":"https://boldstart.vc"}, {"fund_name":"Heavybit","thesis":"The leading investor in developer-first startups. We help technical founders launch, gain traction, and build enterprise-ready companies.","check_size":"$1M - $5M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","Open Source"],"geography_focus":["Global","US"],"notable_portfolio":["PagerDuty","Sanity","Netlify"],"website":"https://www.heavybit.com"}, {"fund_name":"Amplify Partners","thesis":"We invest in technical founders building the next generation of IT infrastructure, developer tools, and data platforms.","check_size":"$2M - $8M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","AI","Data"],"geography_focus":["US"],"notable_portfolio":["Datadog","OCTO","dbt Labs"],"website":"https://www.amplifypartners.com"}, {"fund_name":"OSS Capital","thesis":"We exclusively back early-stage founders building Commercial Open Source Software (COSS) companies.","check_size":"$500k - $2M","stage_focus":["Pre-seed","Seed","Series A"],"industry_tags":["Open Source","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Cal.com","Appsmith","Hoppscotch"],"website":"https://oss.capital"}, {"fund_name":"Sequoia Capital","thesis":"We help the daring build legendary companies, from idea to IPO and beyond. Sequoia is an early-stage and growth-stage investor.","check_size":"$1M - $10M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","AI"],"geography_focus":["Global"],"notable_portfolio":["Apple","Google","WhatsApp"],"website":"https://www.sequoiacap.com"}, {"fund_name":"Andreessen Horowitz (a16z)","thesis":"We invest in software eating the world. We back bold entrepreneurs building the future through technology.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Crypto","Enterprise","Consumer","AI"],"geography_focus":["Global","US"],"notable_portfolio":["Facebook","Coinbase","Figma"],"website":"https://a16z.com"}, {"fund_name":"Point Nine Capital","thesis":"We are a seed-stage venture capital firm focused on B2B SaaS and B2B marketplaces globally.","check_size":"$1M - $3M","stage_focus":["Seed"],"industry_tags":["B2B SaaS","Marketplaces"],"geography_focus":["Europe","Global"],"notable_portfolio":["Zendesk","Typeform","Docplanner"],"website":"https://www.pointnine.com"}, {"fund_name":"Cherry Ventures","thesis":"We champion founders in Europe from their earliest days. We are generalist seed investors.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["Europe"],"notable_portfolio":["FlixBus","Auto1 Group","Forto"],"website":"https://www.cherry.vc"}, {"fund_name":"First Round Capital","thesis":"We are the seed-stage firm that builds the most supportive community for founders.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer"],"geography_focus":["US"],"notable_portfolio":["Uber","Notion","Roblox"],"website":"https://firstround.com"}, {"fund_name":"Bessemer Venture Partners","thesis":"BVP helps entrepreneurs lay strong foundations to build and forge long-standing companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["LinkedIn","Twilio","Shopify"],"website":"https://www.bvp.com"}, {"fund_name":"Index Ventures","thesis":"We back the best and most ambitious entrepreneurs across all stages to build category-defining businesses.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","FinTech","Consumer","B2B SaaS"],"geography_focus":["Europe","US","Global"],"notable_portfolio":["Dropbox","Slack","Figma"],"website":"https://www.indexventures.com"}, {"fund_name":"Lightspeed Venture Partners","thesis":"We invest globally in enterprise, consumer, and health founders who are shaping the future.","check_size":"$1M - $25M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["Snap","Rippling","MuleSoft"],"website":"https://lsvp.com"}, {"fund_name":"Accel","thesis":"We partner with exceptional founders from inception through all phases of private company growth.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Facebook","Atlassian","Spotify"],"website":"https://www.accel.com"}, {"fund_name":"Bain Capital Ventures","thesis":"From seed to growth, we back founders building legendary infrastructure, fintech, application, and commerce companies.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Infrastructure","FinTech","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["DocuSign","SendGrid","Redis"],"website":"https://www.baincapitalventures.com"}, {"fund_name":"Greylock Partners","thesis":"We partner with early-stage founders to build enterprise and consumer software companies that define new categories.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Enterprise","Consumer","Cybersecurity","AI"],"geography_focus":["US"],"notable_portfolio":["Workday","Palo Alto Networks","LinkedIn"],"website":"https://greylock.com"}, {"fund_name":"Unusual Ventures","thesis":"We provide a breakthrough level of support for early-stage founders building enterprise tech.","check_size":"$1M - $5M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Enterprise","DevTools","B2B SaaS"],"geography_focus":["US"],"notable_portfolio":["Arctic Wolf","Harness","Vivun"],"website":"https://www.unusual.vc"}, {"fund_name":"Crane Venture Partners","thesis":"We back deep tech and enterprise founders in Europe solving hard problems with data and code.","check_size":"$1M - $4M","stage_focus":["Seed"],"industry_tags":["Enterprise","DeepTech","Data","AI"],"geography_focus":["Europe"],"notable_portfolio":["Onfido","Tessian","Forto"],"website":"https://crane.vc"}, {"fund_name":"Founder Collective","thesis":"We are a seed-stage venture capital fund, built by founders, for founders. We back weird, wonderful, and wild startups.","check_size":"$500k - $2M","stage_focus":["Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Airtable","BuzzFeed"],"website":"https://www.foundercollective.com"}, {"fund_name":"Benchmark","thesis":"We are a partnership of equal partners. We back mission-driven founders at the earliest stages and walk beside them for the long haul.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Marketplaces","Enterprise","Consumer"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Twitter","eBay","Snapchat"],"website":"https://www.benchmark.com"}, {"fund_name":"Accel India","thesis":"We partner with exceptional founders from inception through all phases of private company growth in the Indian ecosystem.","check_size":"$1M - $15M","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","FinTech","E-commerce"],"geography_focus":["India"],"notable_portfolio":["Flipkart","Swiggy","Freshworks"],"website":"https://www.accel.com/india"}, {"fund_name":"Blume Ventures","thesis":"We are a seed and pre-seed venture fund that backs startups with both funding and active mentoring.","check_size":"$500k - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer","DeepTech","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Unacademy","Purplle","GreyOrange"],"website":"https://blume.vc"}, {"fund_name":"Elevation Capital","thesis":"We partner with visionary founders in India across early stages to help them build category-defining businesses.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Paytm","Swiggy","Meesho"],"website":"https://elevationcapital.com"}, {"fund_name":"Peak XV Partners","thesis":"Formerly Sequoia India & SEA, we partner with founders across early, growth, and public stages to build enduring companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","DevTools","AI"],"geography_focus":["India","South Asia"],"notable_portfolio":["Zomato","Pine Labs","Cred"],"website":"https://www.peakxv.com"}, {"fund_name":"Nexus Venture Partners","thesis":"We are a US-India venture capital firm backing extraordinary founders building product-first companies.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["B2B SaaS","Enterprise","DevTools","Consumer"],"geography_focus":["India","US"],"notable_portfolio":["Postman","Hasura","Zepto"],"website":"https://nexusvp.com"} ] STAGE_ORDER = {"Pre-seed": 0, "Seed": 1, "Series A": 2, "Growth": 3} def score_fund(fund, ctx): score = 0 fund_tags = fund.get("industry_tags", []) extracted_tags = ctx.get("extracted_tags", ["Generalist"]) tag_points = 0 matched_tags = [] for tag in extracted_tags: if tag in fund_tags: tag_points += 5 if tag == "Generalist" else 20 matched_tags.append(tag) tag_points = min(tag_points, 60) score += tag_points stage_hint = ctx.get("stage_hint") fund_stages = fund.get("stage_focus", []) if not stage_hint: score += 10 elif fund_stages: if stage_hint in fund_stages: score += 20 elif stage_hint in STAGE_ORDER: hint_idx = STAGE_ORDER[stage_hint] if any(f in STAGE_ORDER and abs(STAGE_ORDER[f] - hint_idx) == 1 for f in fund_stages): score += 10 geo_hint = ctx.get("geography_hint") fund_geo = fund.get("geography_focus", ["Global"]) if not geo_hint or geo_hint == "Global": score += 10 elif fund_geo == ["India"] and geo_hint == "US": pass elif geo_hint in fund_geo: score += 20 elif "Global" in fund_geo: score += 15 if geo_hint == "US" and "India" in fund_geo and "US" not in fund_geo and "Global" not in fund_geo: score = max(0, score - 30) if fund_tags and extracted_tags and fund_tags[0] not in extracted_tags and tag_points <= 20: score = max(0, score - 15) return score, matched_tags scored = [] for fund in VC_FUNDS: score, matched_tags = score_fund(fund, context) tier = "High" if score >= 70 else ("Medium" if score >= 40 else "Low") scored.append({ "fund_name": fund["fund_name"], "thesis": fund["thesis"], "check_size": fund["check_size"], "stage_focus": fund["stage_focus"], "industry_tags": fund["industry_tags"], "geography_focus": fund["geography_focus"], "notable_portfolio": fund["notable_portfolio"], "website": fund["website"], "source": "verified (fund website)", "score": score, "confidence": tier, "matched_tags": matched_tags }) scored.sort(key=lambda x: (-x["score"], x["fund_name"])) relevant = [m for m in scored if m["confidence"] in ("High", "Medium")] curated_comparables = [] for m in relevant: for company in m.get("notable_portfolio", []): if company not in curated_comparables: curated_comparables.append(company) output = { "high_medium_matches": relevant, "curated_comparables": curated_comparables[:6] } json.dump(output, open('/tmp/vc-curated-matches.json', 'w'), indent=2) print(f'Curated matches: {len(relevant)} High/Medium confidence funds') for m in relevant[:8]: print(f' {m["confidence"]:6} ({m["score"]:3}) {m["fund_name"]}') print(f'Seed comparables from portfolio: {curated_comparables[:6]}') PYEOF ``` --- ## Step 6: Discover Comparable Companies via Tavily Load curated portfolio companies from Step 5b as seed comparables: ```bash python3 -c " import json matches = json.load(open('/tmp/vc-curated-matches.json')) curated = matches.get('curated_comparables', []) print(f'Curated portfolio comparables ({len(curated)}): {curated}') need = max(0, 5 - len(curated)) print(f'Tavily will supplement with up to {need} more') " ``` **Do not use AI training knowledge to generate comparable companies.** Curated portfolio companies (above) are already zero-hallucination comparables from verified fund data. Tavily supplements with L3-niche-specific companies. ```bash python3 << 'PYEOF' import json, os, urllib.request analysis = json.load(open('/tmp/vc-product-analysis.json')) l2 = analysis['industry_taxonomy']['l2'] l3 = analysis['industry_taxonomy']['l3'] tavily_key = os.environ.get('TAVILY_API_KEY', '') queries = [ f'"{l3}" startup raised funding venture capital seed series', f'"{l2}" companies venture backed funded startup' ] all_results = [] for query in queries: payload = json.dumps({ "api_key": tavily_key, "query": query, "search_depth": "advanced", "max_results": 8, "include_answer": True }).encode() req = urllib.request.Request( 'https://api.tavily.com/search', data=payload, headers={'Content-Type': 'application/json'}, method='POST' ) try: with urllib.request.urlopen(req, timeout=30) as resp: result = json.loads(resp.read()) all_results.append({ 'query': query, 'answer': result.get('answer', ''), 'results': [ {'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:500]} for r in result.get('results', []) ] }) print(f'Comparable search: {len(result.get("results", []))} results for "{query[:60]}"') except Exception as e: print(f'Comparable search FAILED: {e}') all_results.append({'query': query, 'answer': '', 'results': [], 'error': str(e)}) json.dump(all_results, open('/tmp/vc-comparable-search.json', 'w'), indent=2) PYEOF ``` Print results for AI selection: ```bash python3 -c " import json results = json.load(open('/tmp/vc-comparable-search.json')) for r in results: print(f'Query: {r[\"query\"]}') print(f'Answer: {r.get(\"answer\",\"\")[:400]}') for item in r.get('results', []): print(f' - {item[\"title\"]} | {item[\"url\"]}') print(f' {item[\"content\"][:200]}') print() " ``` **AI instructions:** Combine the curated portfolio companies from `/tmp/vc-curated-matches.json` with the Tavily search results above. Pick exactly 5 comparable companies. Prioritize curated portfolio companies (already verified -- they are real portfolio companies of matched VC funds). Supplement with Tavily-discovered companies to reach 5 if needed. For each comparable write: - `name`: company name - `similarity_reason`: one sentence explaining the fit (for curated: reference the fund that backed them; for Tavily: cite the snippet) - `source_url`: portfolio fund website for curated companies, Tavily result URL for discovered ones - `estimated_stage`: from curated data or snippet text -- write "not in search data" if unknown - `source_type`: `"curated_portfolio"` or `"tavily_discovered"` Update `/tmp/vc-product-analysis.json` with the `comparable_companies` array: ```bash python3 << 'PYEOF' import json analysis = json.load(open('/tmp/vc-product-analysis.json')) analysis['comparable_companies'] = [ # FILL 5 companies -- curated_portfolio first, then tavily_discovered # Each: {"name": str, "similarity_reason": str, "source_url": str, "estimated_stage": str, "source_type": str} ] json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2) print('Comparables written:', ', '.join(c['name'] for c in analysis['comparable_companies'])) PYEOF ``` **If fewer than 3 comparable companies appear in the search results:** Broaden the queries. Run a third search: `"[l1] startup" funding round venture capital`. If still thin, proceed with what is available and flag in `data_quality_flags`. --- ## Step 7: Track A -- Who Invested in Comparable Companies Run 5 Tavily searches, one per comparable. ```bash python3 << 'PYEOF' import json, os, urllib.request analysis = json.load(open('/tmp/vc-product-analysis.json')) comparables = analysis['comparable_companies'] tavily_key = os.environ.get('TAVILY_API_KEY', '') all_track_a = [] for comp in comparables: company = comp['name'] query = f'"{company}" investors funding venture capital backed seed series' payload = json.dumps({ "api_key": tavily_key, "query": query, "search_depth": "advanced", "max_results": 5, "include_answer": True }).encode() req = urllib.request.Request( 'https://api.tavily.com/search', data=payload, headers={'Content-Type': 'application/json'}, method='POST' ) try: with urllib.request.urlopen(req, timeout=30) as resp: result = json.loads(resp.read()) all_track_a.append({ 'comparable_company': company, 'similarity_reason': comp['similarity_reason'], 'query': query, 'answer': result.get('answer', ''), 'results': result.get('results', []) }) print(f'Track A - {company}: {len(result.get("results", []))} results') except Exception as e: print(f'Track A - {company}: FAILED ({e})') all_track_a.append({ 'comparable_company': company, 'similarity_reason': comp['similarity_reason'], 'query': query, 'answer': '', 'results': [], 'error': str(e) }) json.dump(all_track_a, open('/tmp/vc-tracka-results.json', 'w'), indent=2) print(f'Track A complete. Comparables with results: {sum(1 for r in all_track_a if r.get("results"))}') PYEOF ``` **If all 5 Track A searches return 0 results:** Re-run Step 6 with broader queries. Retry with well-covered companies (those with significant press coverage). If still 0: proceed to Track B only and flag in `data_quality_flags`. --- ## Step 8: Track B -- VCs With Investment Theses About This Space Run 3 Tavily searches using L2 and L3 taxonomy from Step 5. ```bash python3 << 'PYEOF' import json, os, urllib.request analysis = json.load(open('/tmp/vc-product-analysis.json')) l2 = analysis['industry_taxonomy']['l2'] l3 = analysis['industry_taxonomy']['l3'] stage = analysis['detected_stage'] tavily_key = os.environ.get('TAVILY_API_KEY', '') queries = [ {'name': 'thesis_l3', 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'}, {'name': 'thesis_l2', 'query': f'VC fund "{l2}" investment thesis portfolio companies'}, {'name': 'stage_space', 'query': f'{stage} investors "{l3}" startup venture capital fund'} ] all_track_b = [] for q in queries: payload = json.dumps({ "api_key": tavily_key, "query": q['query'], "search_depth": "advanced", "max_results": 7, "include_answer": True }).encode() req = urllib.request.Request( 'https://api.tavily.com/search', data=payload, headers={'Content-Type': 'application/json'}, method='POST' ) try: with urllib.request.urlopen(req, timeout=30) as resp: result = json.loads(resp.read()) all_track_b.append({ 'query_name': q['name'], 'query': q['query'], 'answer': result.get('answer', ''), 'results': result.get('results', []) }) print(f"Track B - {q['name']}: {len(result.get('results', []))} results") except Exception as e: print(f"Track B - {q['name']}: FAILED ({e})") all_track_b.append({'query_name': q['name'], 'query': q['query'], 'answer': '', 'results': [], 'error': str(e)}) json.dump(all_track_b, open('/tmp/vc-trackb-results.json', 'w'), indent=2) PYEOF ``` **If all 3 Track B searches return 0 results:** Proceed with Track A results only. Note in `data_quality_flags`: "No thesis-led investors found via public search." --- ## Step 9: Synthesize -- Rank and Score All VCs Print the research data: ```bash python3 -c " import json analysis = json.load(open('/tmp/vc-product-analysis.json')) track_a = json.load(open('/tmp/vc-tracka-results.json')) track_b = json.load(open('/tmp/vc-trackb-results.json')) curated = json.load(open('/tmp/vc-curated-matches.json')) track_a_summary = [] for item in track_a: snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]} for r in item.get('results', [])[:3]] track_a_summary.append({ 'comparable_company': item['comparable_company'], 'similarity_reason': item['similarity_reason'], 'answer': item.get('answer', '')[:500], 'top_results': snippets }) track_b_summary = [] for item in track_b: snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]} for r in item.get('results', [])[:4]] track_b_summary.append({ 'query_name': item['query_name'], 'answer': item.get('answer', '')[:500], 'top_results': snippets }) curated_summary = [] for m in curated.get('high_medium_matches', []): curated_summary.append({ 'fund_name': m['fund_name'], 'confidence': m['confidence'], 'score': m['score'], 'matched_tags': m['matched_tags'], 'thesis': m['thesis'], 'check_size': m['check_size'], 'stage_focus': m['stage_focus'], 'notable_portfolio': m['notable_portfolio'], 'website': m['website'], 'source': 'verified (fund website)' }) print(json.dumps({ 'product': { 'name': analysis['product_name'], 'description': analysis['one_line_description'], 'industry': analysis['industry_taxonomy'], 'icp': analysis['icp'], 'stage': analysis['detected_stage'], 'stage_confidence': analysis['stage_confidence'], 'geography': analysis['geography_bias'] }, 'curated_matches': curated_summary, 'track_a_research': track_a_summary, 'track_b_research': track_b_summary }, indent=2)) " ``` **AI instructions -- zero-hallucination rules:** Every field in the output must be traceable to the printed data above. Rules: 1. **curated_vcs:** Use the `curated_matches` data directly. These are pre-verified -- no Tavily evidence required. `fund_overview` comes from the `thesis` field in the curated data. `check_size` and `stage_focus` come from the curated data fields. Do NOT fill from training knowledge even for these funds. 2. **VC names (Track A / B):** Only include a fund if its name appears verbatim in the snippet text or title. No exceptions. 3. **evidence_company (Track A):** The comparable company they backed -- must be stated in the snippet text, not inferred. 4. **thesis_source_title (Track B):** The exact title of the article or post as it appears in the search results. 5. **fund_overview (Track A / B):** Extract from snippet text only. Max 2 sentences. If the snippets do not describe the fund, write "not found in search data". 6. **thesis_summary:** Close paraphrase of the snippet text. Do not add context from training knowledge. 7. **check_size (Track A / B):** From snippet data only. Write "not in search data" if not mentioned. 8. **portfolio_in_space:** Only companies that appear in the search snippets. Write "not found in search data" if none. 9. **stage_fit_score 1-10:** Penalize 3 points if the VC's stated stage does not match the product's detected stage. 10. **space_fit_score 1-10:** 9-10 only if the VC backed 2+ companies in the L3 niche per the snippets or curated data. 11. **approach_method:** one of -- cold email / warm intro required / AngelList / application form / Twitter/X DM. Infer from snippets or fund website. 12. **outreach_hook:** Must name a specific portfolio signal or thesis quote. Generic hooks like "highlight your traction" are not acceptable. 13. No em dashes. No marketing language. Write to `/tmp/vc-final-list.json`: - `product_summary`: name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (names only) - `curated_vcs`: fund_name, confidence ("High"/"Medium"), matched_tags, fund_overview (from thesis field), check_size, stage_focus, website, source ("verified (fund website)"), stage_fit_score, space_fit_score - `track_a_vcs`: fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method - `track_b_vcs`: fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method - `top_5_deep_dives`: fund_name, track ("Curated"/"A"/"B"), fund_overview, why_fit, portfolio_in_space, how_to_approach (min 30 chars), outreach_hook - `outreach_hooks`: 3 objects -- hook_type, hook_text (2-3 sentences), best_for - `data_quality_flags`: gaps, missing fields, low-confidence areas ```bash python3 << 'PYEOF' import json result = { # FILL from synthesis above # Must include: product_summary, curated_vcs, track_a_vcs, track_b_vcs, top_5_deep_dives, outreach_hooks, data_quality_flags } json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2) print(f'Synthesis written. Curated: {len(result.get("curated_vcs", []))} VCs. Track A: {len(result.get("track_a_vcs", []))} VCs. Track B: {len(result.get("track_b_vcs", []))} VCs.') PYEOF ``` --- ## Step 10: Self-QA ```bash python3 << 'PYEOF' import json result = json.load(open('/tmp/vc-final-list.json')) failures = [] # Remove Track A VCs missing evidence_company original_a = len(result.get('track_a_vcs', [])) result['track_a_vcs'] = [v for v in result.get('track_a_vcs', []) if v.get('evidence_company')] removed_a = original_a - len(result['track_a_vcs']) if removed_a > 0: failures.append(f'Removed {removed_a} Track A VC(s) missing evidence_company') # Remove Track B VCs missing thesis_source_title original_b = len(result.get('track_b_vcs', [])) result['track_b_vcs'] = [v for v in result.get('track_b_vcs', []) if v.get('thesis_source_title')] removed_b = original_b - len(result['track_b_vcs']) if removed_b > 0: failures.append(f'Removed {removed_b} Track B VC(s) missing thesis_source_title') # Remove deep dives for VCs that were stripped from all tracks valid_funds = ( {v['fund_name'] for v in result.get('curated_vcs', [])} | {v['fund_name'] for v in result.get('track_a_vcs', [])} | {v['fund_name'] for v in result.get('track_b_vcs', [])} ) original_dives = len(result.get('top_5_deep_dives', [])) result['top_5_deep_dives'] = [d for d in result.get('top_5_deep_dives', []) if d.get('fund_name') in valid_funds] removed_dives = original_dives - len(result['top_5_deep_dives']) if removed_dives > 0: failures.append(f'Removed {removed_dives} deep dive(s) for funds stripped during QA') # Check top 5 deep dives dives = result.get('top_5_deep_dives', []) if len(dives) < 5: failures.append(f'Only {len(dives)} deep dives (expected 5) -- insufficient search data') for dd in dives: if not dd.get('how_to_approach') or len(dd.get('how_to_approach', '')) < 30: dd['how_to_approach'] = 'Approach method not determinable from search data. Check the fund website directly for application instructions.' failures.append(f"Fixed: '{dd.get('fund_name')}' had missing how_to_approach") if not dd.get('fund_overview') or dd.get('fund_overview') == '': dd['fund_overview'] = 'not found in search data' # Check outreach hooks count if len(result.get('outreach_hooks', [])) != 3: failures.append(f"Expected 3 outreach hooks, got {len(result.get('outreach_hooks', []))}") # Check for em dashes full_text = json.dumps(result) if '—' in full_text: result = json.loads(full_text.replace('—', '-')) failures.append('Fixed: em dash characters replaced with hyphens') # Check for forbidden words forbidden = ['powerful', 'robust', 'seamless', 'innovative', 'game-changing', 'streamline', 'leverage', 'transform'] full_text_lower = json.dumps(result).lower() for word in forbidden: if word in full_text_lower: failures.append(f"Warning: forbidden word '{word}' found in output -- review before presenting") # Flag any "not found in search data" entries so user knows coverage is incomplete not_found_count = json.dumps(result).count('not found in search data') if not_found_count > 0: failures.append(f'INFO: {not_found_count} field(s) marked "not found in search data" -- verify directly before outreach') if 'data_quality_flags' not in result: result['data_quality_flags'] = [] result['data_quality_flags'].extend(failures) json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2) print(f'QA complete. Issues addressed: {len(failures)}') for f in failures: print(f' - {f}') if not failures: print('All QA checks passed.') PYEOF ``` --- ## Step 11: Save and Present Output ```bash DATE=$(date +%Y-%m-%d) OUTPUT_FILE="docs/vc-intel/${PRODUCT_SLUG}-${DATE}.md" mkdir -p docs/vc-intel ``` Present the final output: ``` ## VC Finder: [product_name] Date: [today] | Stage: [detected_stage] ([stage_confidence] confidence) | Geography: [geography_bias] --- ### Product Analysis What it does: [one_line_description] Industry: [l1] > [l2] > [l3] Buyer: [buyer_persona] at [company_type], [company_size] Comparable companies used: [comma-separated list, noting source_type for each] --- ### Curated Matches (Verified) *Funds matched from a verified dataset of 25 VC funds sourced from fund websites. Zero hallucination -- details come directly from the dataset.* | Fund | Confidence | Stage Focus | Check Size | Matched Tags | |---|---|---|---|---| [one row per curated VC, sorted by confidence then score] --- ### Track A: VCs Who Backed Similar Companies *These investors have already written a check in this space. Evidence from live Tavily search.* | Fund | Backed Comparable | Stage Focus | Check Size | Fit Score | Approach | |---|---|---|---|---|---| [one row per Track A VC, sorted by space_fit_score descending] --- ### Track B: Thesis-Led Investors *These investors are actively publishing about this space.* | Fund | Thesis Source | Stage Focus | Check Size | Fit Score | Approach | |---|---|---|---|---|---| [one row per Track B VC, sorted by space_fit_score descending] --- ### Top 5 Deep Dives #### [N]. [Fund Name] (Track [Curated/A/B]) Overview: [fund_overview -- from dataset or search data only] Why it fits: [why_fit] Portfolio in this space: [from dataset or search data, or "not found in search data"] How to approach: [how_to_approach] Outreach hook: "[outreach_hook]" [repeat for all available deep dives] --- ### 3 Outreach Hooks for This Product Type **1. [hook_type]** [hook_text] Best for: [best_for] [repeat for all 3] --- Data quality notes: [data_quality_flags, or "None"] Saved to: docs/vc-intel/[PRODUCT_SLUG]-[DATE].md ``` Clean up temp files: ```bash rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-product-analysis.json \ /tmp/vc-product-context.json /tmp/vc-curated-matches.json /tmp/vc-comparable-search.json \ /tmp/vc-tracka-results.json /tmp/vc-trackb-results.json /tmp/vc-final-list.json ```