---
name: marketplace-search-recsys-planning
description: Search and recommendation system planning for a two-sided trust marketplace built on OpenSearch — user-intent framing, product-surface architecture, index design, query understanding, retrieval strategy, ranking, search-plus-recs blending, measurement, and a dashboard-and-alerting layer for ongoing decision making. Triggers on tasks involving marketplace search, homefeeds, ranking, relevance tuning, OpenSearch query DSL, analyzers, synonyms, golden sets, NDCG, A/B testing, or diagnosing an existing retrieval system. Use this skill BEFORE marketplace-personalisation when planning new work; hand off when the diagnosed bottleneck is personalisation-specific.
---
# Marketplace Engineering Two-Sided Search and Recsys Planning Best Practices

Comprehensive planning, design and diagnostic guide for search and recommendation systems
in two-sided trust marketplaces. Covers OpenSearch index, query and ranking patterns, the
methodology for planning retrieval work, the handoff points to recommendation-specific
tooling, and the instrumentation and dashboard layer that turns measurement into ongoing
decision making. Contains 57 rules across 10 categories ordered by cascade impact, plus
two playbooks (plan a new system from scratch, diagnose an existing one) and explicit
living-artefact conventions (decisions log, golden set, gotchas).

## When to Apply

Reference this skill when:

- Planning a new marketplace retrieval project from scratch
- Reviewing an existing retrieval system that feels stale, unfair, or unpersonalised
- Designing the OpenSearch index mapping, analyzers, or query DSL
- Choosing retrieval primitives per product surface (search, recs, hybrid, curated)
- Deciding which search quality metrics to track and dashboard
- Running the weekly search-quality review ritual
- Diagnosing a silent regression in ranking, coverage, or zero-result rate
- Deciding when a retrieval problem is actually a personalisation problem

This skill is the **precursor** to `marketplace-personalisation`. Start here for
planning and search work; hand off to the personalisation skill when the diagnosed
bottleneck is impression tracking, feedback-loop bias, or AWS Personalize-specific
design.

## Living Context

This skill treats the system as evolving. Three living artefacts carry context across
sessions, releases, and team changes — read them before making suggestions, update them
after every shipped change:

- **`gotchas.md`** (in this skill folder) — append-only diagnostic lessons. Every gotcha
  has a date and a short description of what surprised the team and how it was resolved.
- **Decisions log** (maintained in the product repo, typically `decisions/*.md`) —
  every ranking change, schema tweak, and synonym edit recorded with its hypothesis,
  offline and online evidence, ship criterion, outcome, and rollback path. See rule
  [`plan-maintain-a-decisions-log`](references/plan-maintain-a-decisions-log.md).
- **Golden query set** (frozen per eval cycle, committed to the product repo) — the
  reference set of queries against which every ranking change is offline-evaluated
  before an online test. See rule
  [`plan-version-the-golden-set`](references/plan-version-the-golden-set.md).

## Rule Categories

Categories are ordered by cascade impact on the retrieval lifecycle: intent
misunderstanding poisons architecture; wrong architecture poisons index; wrong index
poisons retrieval forever until a reindex; every downstream layer inherits the upstream
error.

| # | Category | Prefix | Impact |
|---|----------|--------|--------|
| 1 | Problem Framing and User Intent | `intent-` | CRITICAL |
| 2 | Surface Taxonomy and Architecture | `arch-` | CRITICAL |
| 3 | Index Design and Mapping | `index-` | HIGH |
| 4 | Planning and Improvement Methodology | `plan-` | HIGH |
| 5 | Query Understanding | `query-` | MEDIUM-HIGH |
| 6 | Retrieval Strategy | `retrieve-` | MEDIUM-HIGH |
| 7 | Relevance and Ranking | `rank-` | MEDIUM-HIGH |
| 8 | Search and Recommender Blending | `blend-` | MEDIUM |
| 9 | Measurement and Experimentation | `measure-` | MEDIUM |
| 10 | Instrumentation, Dashboards and Decision Triggers | `monitor-` | MEDIUM |

## Quick Reference

### 1. Problem Framing and User Intent (CRITICAL)

- [`intent-map-queries-to-intent-classes`](references/intent-map-queries-to-intent-classes.md) — classify before retrieving
- [`intent-separate-known-item-from-discovery`](references/intent-separate-known-item-from-discovery.md) — different failure modes, different strategies
- [`intent-audit-live-query-logs-first`](references/intent-audit-live-query-logs-first.md) — design from real data, not imagined data
- [`intent-distinguish-transactional-from-exploratory`](references/intent-distinguish-transactional-from-exploratory.md) — precision vs diversity
- [`intent-reject-one-search-for-everything`](references/intent-reject-one-search-for-everything.md) — per-surface query shapes
- [`intent-treat-no-search-as-first-class-choice`](references/intent-treat-no-search-as-first-class-choice.md) — curated is a legitimate answer

### 2. Surface Taxonomy and Architecture (CRITICAL)

- [`arch-map-surface-to-retrieval-primitive`](references/arch-map-surface-to-retrieval-primitive.md) — a single-source-of-truth routing table
- [`arch-split-candidate-generation-from-ranking`](references/arch-split-candidate-generation-from-ranking.md) — two-stage pipelines
- [`arch-design-zero-result-fallback`](references/arch-design-zero-result-fallback.md) — declare fallback owner per surface
- [`arch-design-for-cold-start-from-day-one`](references/arch-design-for-cold-start-from-day-one.md) — cold start is permanent, not bootstrap
- [`arch-avoid-mono-stack-retrieval`](references/arch-avoid-mono-stack-retrieval.md) — diversify primary dependencies
- [`arch-route-surfaces-deliberately`](references/arch-route-surfaces-deliberately.md) — every routing decision recorded

### 3. Index Design and Mapping (HIGH)

- [`index-design-mappings-conservatively`](references/index-design-mappings-conservatively.md) — reindex is expensive
- [`index-use-keyword-and-text-as-multi-fields`](references/index-use-keyword-and-text-as-multi-fields.md) — full-text plus exact match
- [`index-match-index-and-query-time-analyzers`](references/index-match-index-and-query-time-analyzers.md) — tokens must agree
- [`index-use-language-analyzers-for-language-fields`](references/index-use-language-analyzers-for-language-fields.md) — language-aware stemming
- [`index-separate-searchable-from-display-fields`](references/index-separate-searchable-from-display-fields.md) — index only what you search
- [`index-use-index-templates-for-consistency`](references/index-use-index-templates-for-consistency.md) — prevent mapping drift
- [`index-stream-listing-updates-via-cdc`](references/index-stream-listing-updates-via-cdc.md) — freshness in seconds, not hours

### 4. Planning and Improvement Methodology (HIGH)

- [`plan-audit-before-you-build`](references/plan-audit-before-you-build.md) — instrumentation gate on kick-off
- [`plan-build-golden-query-set-first`](references/plan-build-golden-query-set-first.md) — the first artefact, not the last
- [`plan-find-bottleneck-before-optimising`](references/plan-find-bottleneck-before-optimising.md) — theory of constraints
- [`plan-maintain-a-decisions-log`](references/plan-maintain-a-decisions-log.md) — living context across team changes
- [`plan-version-the-golden-set`](references/plan-version-the-golden-set.md) — frozen per eval cycle
- [`plan-handoff-to-personalisation-skill`](references/plan-handoff-to-personalisation-skill.md) — recognise the boundary

### 5. Query Understanding (MEDIUM-HIGH)

- [`query-normalise-before-anything-else`](references/query-normalise-before-anything-else.md) — canonical string in
- [`query-use-language-analyzers-for-stemming`](references/query-use-language-analyzers-for-stemming.md) — double-digit recall wins
- [`query-curate-synonyms-by-domain`](references/query-curate-synonyms-by-domain.md) — domain vocabulary not thesaurus
- [`query-use-fuzzy-matching-for-typos`](references/query-use-fuzzy-matching-for-typos.md) — 10-15% of queries have typos
- [`query-classify-before-routing`](references/query-classify-before-routing.md) — single-pass classifier
- [`query-build-autocomplete-on-separate-index`](references/query-build-autocomplete-on-separate-index.md) — latency isolation

### 6. Retrieval Strategy (MEDIUM-HIGH)

- [`retrieve-use-filter-clauses-for-exact-matches`](references/retrieve-use-filter-clauses-for-exact-matches.md) — filter cache wins
- [`retrieve-use-bool-structure-deliberately`](references/retrieve-use-bool-structure-deliberately.md) — must vs should vs filter
- [`retrieve-run-expensive-signals-in-rescore`](references/retrieve-run-expensive-signals-in-rescore.md) — rescore window limits cost
- [`retrieve-combine-bm25-and-knn-via-hybrid-search`](references/retrieve-combine-bm25-and-knn-via-hybrid-search.md) — lexical plus semantic
- [`retrieve-paginate-with-search-after`](references/retrieve-paginate-with-search-after.md) — constant-cost deep pagination
- [`retrieve-choose-embedding-model-deliberately`](references/retrieve-choose-embedding-model-deliberately.md) — re-embedding is expensive

### 7. Relevance and Ranking (MEDIUM-HIGH)

- [`rank-tune-bm25-parameters-last`](references/rank-tune-bm25-parameters-last.md) — upstream levers first
- [`rank-use-function-score-for-business-signals`](references/rank-use-function-score-for-business-signals.md) — explicit named functions
- [`rank-deploy-ltr-only-after-golden-set-exists`](references/rank-deploy-ltr-only-after-golden-set-exists.md) — supervised learning needs labels
- [`rank-apply-diversity-at-rank-time`](references/rank-apply-diversity-at-rank-time.md) — after scoring, not before
- [`rank-normalise-scores-across-retrieval-primitives`](references/rank-normalise-scores-across-retrieval-primitives.md) — comparable scales

### 8. Search and Recommender Blending (MEDIUM)

- [`blend-use-search-alone-for-specific-intent`](references/blend-use-search-alone-for-specific-intent.md) — precision queries
- [`blend-combine-search-and-personalisation-scores`](references/blend-combine-search-and-personalisation-scores.md) — normalised weighted sum
- [`blend-keep-hybrid-blending-explainable`](references/blend-keep-hybrid-blending-explainable.md) — traceable results
- [`blend-never-return-zero-results`](references/blend-never-return-zero-results.md) — guaranteed cascade to non-empty

### 9. Measurement and Experimentation (MEDIUM)

- [`measure-define-session-success-per-surface`](references/measure-define-session-success-per-surface.md) — one definition per surface
- [`measure-track-ndcg-mrr-zero-result-rate`](references/measure-track-ndcg-mrr-zero-result-rate.md) — three metrics for one picture
- [`measure-track-reformulation-rate-as-failure-signal`](references/measure-track-reformulation-rate-as-failure-signal.md) — cheapest failure metric
- [`measure-use-click-models-for-implicit-judgments`](references/measure-use-click-models-for-implicit-judgments.md) — scale beyond human judges
- [`measure-run-interleaving-as-cheap-ab-proxy`](references/measure-run-interleaving-as-cheap-ab-proxy.md) — 10x less sample needed

### 10. Instrumentation, Dashboards and Decision Triggers (MEDIUM)

- [`monitor-log-every-query-with-full-context`](references/monitor-log-every-query-with-full-context.md) — structured replayable events
- [`monitor-scrub-pii-from-query-logs`](references/monitor-scrub-pii-from-query-logs.md) — redact before warehouse ingestion
- [`monitor-build-search-health-dashboard`](references/monitor-build-search-health-dashboard.md) — threshold lines, colour bands
- [`monitor-alert-on-decision-triggers`](references/monitor-alert-on-decision-triggers.md) — quality metrics, not error rates
- [`monitor-track-ranking-stability-churn`](references/monitor-track-ranking-stability-churn.md) — RBO churn as leading indicator
- [`monitor-run-weekly-search-quality-review`](references/monitor-run-weekly-search-quality-review.md) — calendar-driven ritual

## Planning and Improving

Two playbooks compose the rules into end-to-end workflows:

- [`references/playbooks/planning.md`](references/playbooks/planning.md) — Plan a new marketplace retrieval system from scratch. Nine-step workflow from intent audit through the first A/B-tested online lift, with explicit exit criteria per step.
- [`references/playbooks/improving.md`](references/playbooks/improving.md) — Diagnose and improve an existing retrieval system. Decision tree that walks through telemetry, index freshness, coverage, baseline gap, cold start, segment regressions, and algorithm iteration in that order, with hand-off points to `marketplace-personalisation` when the bottleneck is personalisation-specific.

Read the playbooks first when the task is "design a new search and recommender project"
or "this retrieval system needs to get better". Read individual rules when a specific
question arises during implementation or review.

## How to Use

- Read [`references/_sections.md`](references/_sections.md) for category structure and cascade rationale.
- Read [`gotchas.md`](gotchas.md) for diagnostic lessons accumulated from prior incidents.
- Read [`references/playbooks/planning.md`](references/playbooks/planning.md) to plan a new system.
- Read [`references/playbooks/improving.md`](references/playbooks/improving.md) to diagnose an existing one.
- Read individual rule files when a specific task matches the rule title.
- Use [`assets/templates/_template.md`](assets/templates/_template.md) to author new rules as the skill grows.

## Related Skills

- **`marketplace-personalisation`** — The companion skill covering AWS Personalize implementation, impression tracking, schema design, two-sided matching, feedback loops, and the personalisation-specific diagnostic playbook. Hand off to this skill when the diagnostic identifies a personalisation-specific bottleneck.

## Reference Files

| File | Description |
|------|-------------|
| [references/_sections.md](references/_sections.md) | Category definitions and impact ordering |
| [references/playbooks/planning.md](references/playbooks/planning.md) | Plan a new retrieval system |
| [references/playbooks/improving.md](references/playbooks/improving.md) | Diagnose an existing retrieval system |
| [gotchas.md](gotchas.md) | Accumulated diagnostic lessons (living) |
| [assets/templates/_template.md](assets/templates/_template.md) | Template for authoring new rules |
| [metadata.json](metadata.json) | Version, discipline, references |