--- name: matching-engine description: Core matching algorithm using pgvector semantic similarity. Finds "I have, they need" and "I need, they have" connections between users. --- ## Matching goal Connect users in two independent directions: 1. **"I have, they need"** (forward): My HAVE resource is semantically similar to someone's WANT 2. **"I need, they have"** (reverse): Someone's HAVE resource is semantically similar to my WANT Each direction is scored and displayed independently. There is no combined "bidirectional" score — the two directions are separate sections on the matches page. ## Threshold configuration All scoring thresholds live in **`src/lib/constants.ts`** under `MATCH_THRESHOLDS`. Never hardcode threshold values elsewhere — always import from constants. ```typescript import { MATCH_THRESHOLDS } from '@/lib/constants'; // MATCH_THRESHOLDS.MIN_SCORE — minimum to store a match // MATCH_THRESHOLDS.STRONG — "strong match" badge // MATCH_THRESHOLDS.GOOD — "good match" badge // MATCH_THRESHOLDS.MAX_PER_USER — max matches stored per user // MATCH_THRESHOLDS.CANDIDATE_POOL — nearest-neighbor candidates per query ``` These values depend on the embedding model's score distribution. When switching embedding models, recalibrate by: 1. Running `scripts/recalculate-all-matches.ts` 2. Checking the actual score distribution with SQL 3. Adjusting thresholds in `constants.ts` so badge tiers produce meaningful separation ## How matching works ### Step 1: Generate embeddings on resource creation See `src/server/services/embedding.ts`. The embedding provider is configurable (Gemini, OpenAI, mock). ### Step 2: Find matches with pgvector `src/server/services/matching.ts` uses cosine similarity (`1 - (a <=> b)`) to find nearest-neighbor resources across users. ### Step 3: Per-direction scoring For each candidate user, we track the best forward score and best reverse score independently. A match is stored if either direction exceeds `MATCH_THRESHOLDS.MIN_SCORE`. ### Step 4: Dual-row insert Each match inserts TWO rows in a single transaction: - **Primary row** (A → B): A's perspective - **Mirror row** (B → A): Scores and resource IDs swapped, so B immediately sees the match Mirror rows use `ON CONFLICT DO UPDATE` to handle the case where B already has a row for that pair. ## Match table: four resource references The Prisma schema stores both directions per row: - `forwardHave` / `forwardWant` — my HAVE matched their WANT - `reverseHave` / `reverseWant` — their HAVE matched my WANT A row may have only forward, only reverse, or both populated. ## API: querying matches by direction `src/server/routers/match.ts` — the `myMatches` endpoint accepts a `direction` param: - `direction: 'forward'` → filter by `forwardScore >= minScore` - `direction: 'reverse'` → filter by `reverseScore >= minScore` - `direction: undefined` → filter by `score >= minScore` ## UI: two-section matches page `/matches` displays two stacked sections (not tabs): - **"I have, they need"** — forward matches. Card shows: who needs + resource title - **"I need, they have"** — reverse matches. Card shows: who has + resource title Score badge is displayed on its own row below the resource title. ## When matching runs 1. **On resource create/update** — recalculate for the triggering user 2. **On resource close/pause** — recalculate (closed resources excluded) 3. **Vercel Cron (every 4 hours)** — reconcile stale users + clean up matches referencing non-ACTIVE resources 4. **Never on page load** — always serve from cached Match table ## Performance notes - pgvector with IVFFlat index: good enough for 100k resources - Create index: `CREATE INDEX ON resources USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);` - Cron processes at most 20 users per invocation (60s timeout) ## What NOT to do - Don't hardcode threshold values — always use `MATCH_THRESHOLDS` from constants - Don't treat forward and reverse as a single combined score - Don't require both directions for a match to be valid - Don't run matching on every page load — serve from cached Match table - Don't try chain matching yet — that's v2