---
name: django-recommender-search-backend-patterns
description: Django backend patterns for recommendation services (AWS Personalize, Databricks Model Serving, internal microservices) and OpenSearch-backed search/feed endpoints. Covers fan-out orchestration (asyncio.gather, deadline propagation, partial results, async client reuse), external service protection (timeouts, circuit breakers, jittered retry, bulkheads, rate limits), OpenSearch query patterns (search_after, _source filtering, function_score, aliases, routing, bool.filter), result blending (score normalization, MMR, dedup, cold-start), Redis caching (stampede protection, model-versioned keys, two-tier, negative), resilience (partial-response envelope, stale-on-error, graceful degradation), async (sync_to_async, async ORM, uvicorn, contextvars, disconnect cancellation), and DRF response shape (cursor pagination, ETag, throttling). Use when building, reviewing, or refactoring such a Django backend. Triggers even without explicit "scale" cues. Includes 5 scaffolding templates.
---
# Experimental Django Recommender + Search Backend Best Practices

Implementation patterns for a Django backend serving mixed-results recommendations (Personalize / Databricks / microservice fan-out) and OpenSearch-backed search/feeds. **48 rules across 8 categories**, ordered by execution lifecycle impact — earlier categories cascade through everything downstream.

This is the *backend* peer of the `react-fetch-cache-patterns` skill. React handles client-side waterfalls and caching; this skill handles server-side fan-out, downstream protection, OpenSearch query design, and ML-blend orchestration.

## When to Apply

- Building or reviewing Django views that fan out to AWS Personalize, Databricks Model Serving, internal microservices, or any ML inference downstream
- Designing OpenSearch query endpoints (search results, infinite feeds, faceted search)
- Implementing a recommendations endpoint that blends multiple ranker outputs
- Investigating "Django backend slow when downstream is degraded" or "Personalize quota exhausted"
- Adding caching, retry, circuit breakers, or rate limiting to outbound calls
- Choosing between sync and async Django views, configuring uvicorn vs gunicorn
- Designing DRF response shapes for paginated feeds, partial results, or degraded paths

## Rule Categories by Priority

| # | Category | Impact | Prefix | Rules |
|---|----------|--------|--------|-------|
| 1 | Fan-out Orchestration | CRITICAL | `orch-` | 8 |
| 2 | External Service Protection | CRITICAL | `protect-` | 7 |
| 3 | OpenSearch Query Patterns | CRITICAL | `search-` | 8 |
| 4 | Result Blending & Personalization | HIGH | `blend-` | 5 |
| 5 | Caching Strategy | HIGH | `cache-` | 5 |
| 6 | Resilience & Partial Results | HIGH | `resilience-` | 5 |
| 7 | Async & Concurrency | MEDIUM-HIGH | `async-` | 5 |
| 8 | API Response Design | MEDIUM | `api-` | 5 |

## Quick Reference

### 1. Fan-out Orchestration (CRITICAL)

- [`orch-parallel-fanout-asyncio-gather`](references/orch-parallel-fanout-asyncio-gather.md) — Use `asyncio.gather` for independent downstream calls; never await sequentially
- [`orch-return-exceptions-on-fanout`](references/orch-return-exceptions-on-fanout.md) — `return_exceptions=True` so one failure doesn't poison the whole gather
- [`orch-propagate-request-deadline`](references/orch-propagate-request-deadline.md) — Pass a deadline through every downstream call to bound whole-request latency
- [`orch-reuse-async-clients`](references/orch-reuse-async-clients.md) — One `httpx.AsyncClient` per downstream at module scope; never per-request
- [`orch-bounded-fanout-concurrency`](references/orch-bounded-fanout-concurrency.md) — Cap per-request fan-out with `asyncio.Semaphore` to protect the pool
- [`orch-no-blocking-in-async-view`](references/orch-no-blocking-in-async-view.md) — Never block the event loop with sync ORM/IO in async views
- [`orch-avoid-await-in-loop`](references/orch-avoid-await-in-loop.md) — `for item in items: await ...` is serial; use `asyncio.gather` with comprehension
- [`orch-batch-with-bulk-endpoint`](references/orch-batch-with-bulk-endpoint.md) — Bulk endpoint over N parallel calls; DataLoader pattern for batchers

### 2. External Service Protection (CRITICAL)

- [`protect-per-downstream-timeout-budget`](references/protect-per-downstream-timeout-budget.md) — Different timeouts per service matched to each downstream's p99
- [`protect-circuit-breaker-per-downstream`](references/protect-circuit-breaker-per-downstream.md) — One breaker per downstream so failures stay isolated
- [`protect-jittered-retry-backoff`](references/protect-jittered-retry-backoff.md) — Full-jitter exponential backoff to prevent thundering-herd recovery
- [`protect-no-retry-on-4xx`](references/protect-no-retry-on-4xx.md) — Skip retry on 4xx and non-idempotent failures; distinguish connect vs read errors
- [`protect-bulkhead-connection-pool`](references/protect-bulkhead-connection-pool.md) — One connection pool per downstream so one slow service doesn't starve others
- [`protect-client-side-rate-limit`](references/protect-client-side-rate-limit.md) — Token bucket toward each downstream to stay under their quota
- [`protect-honor-retry-after-header`](references/protect-honor-retry-after-header.md) — Parse `Retry-After` (seconds or HTTP-date) on 429/503

### 3. OpenSearch Query Patterns (CRITICAL)

- [`search-use-search-after-not-from`](references/search-use-search-after-not-from.md) — `search_after` cursor instead of `from/size` for any paginated endpoint
- [`search-filter-source-fields`](references/search-filter-source-fields.md) — Restrict `_source` to fields you render; use `docvalue_fields` for sortable
- [`search-bool-filter-vs-must`](references/search-bool-filter-vs-must.md) — Non-scoring clauses in `filter` (cacheable), scoring clauses in `must`
- [`search-function-score-for-blending`](references/search-function-score-for-blending.md) — Use `function_score` to blend personalization signals in-engine
- [`search-stable-tiebreaker-sort`](references/search-stable-tiebreaker-sort.md) — Always end sort with `_id` (or unique numeric field) for stable cursors
- [`search-alias-for-blue-green-reindex`](references/search-alias-for-blue-green-reindex.md) — Query through aliases; never direct index names
- [`search-enable-request-cache`](references/search-enable-request-cache.md) — `request_cache=true` for hit-returning queries; canonicalize request body
- [`search-shard-aware-routing`](references/search-shard-aware-routing.md) — Use routing keys to limit per-query shard fan-out

### 4. Result Blending & Personalization (HIGH)

- [`blend-normalize-scores-across-sources`](references/blend-normalize-scores-across-sources.md) — Min-max or RRF normalize before blending Personalize/Databricks/OpenSearch
- [`blend-mmr-for-diversity`](references/blend-mmr-for-diversity.md) — Maximal Marginal Relevance to avoid monocultures in top-K
- [`blend-dedup-across-sources`](references/blend-dedup-across-sources.md) — Canonical item ID dedup; bonus for cross-source corroboration
- [`blend-cold-start-fallback`](references/blend-cold-start-fallback.md) — Popular/editorial fallback for new users; tiered personalization
- [`blend-anonymous-vs-personalized-paths`](references/blend-anonymous-vs-personalized-paths.md) — Cheap segment-keyed cache for anon traffic; ML only for logged-in

### 5. Caching Strategy (HIGH)

- [`cache-redis-with-stampede-protection`](references/cache-redis-with-stampede-protection.md) — `SETNX` lock + jittered TTL + probabilistic early refresh
- [`cache-version-on-model-deploy`](references/cache-version-on-model-deploy.md) — Bake model version into cache keys; no flush needed on retrain
- [`cache-segment-keyed-isolation`](references/cache-segment-keyed-isolation.md) — Include auth/role/locale/segment in keys to prevent cross-context leakage
- [`cache-two-tier-process-and-redis`](references/cache-two-tier-process-and-redis.md) — Process LRU in front of Redis for the hottest keys
- [`cache-negative-results`](references/cache-negative-results.md) — Cache absences and empty results with shorter TTL

### 6. Resilience & Partial Results (HIGH)

- [`resilience-partial-response-envelope`](references/resilience-partial-response-envelope.md) — `partial: true` + `sources_used` + `failed_sources` in response
- [`resilience-serve-stale-from-redis`](references/resilience-serve-stale-from-redis.md) — Two TTLs (fresh + stale); serve stale on origin failure
- [`resilience-default-ranking-fallback`](references/resilience-default-ranking-fallback.md) — Precomputed default ranking when all ML sources are down
- [`resilience-per-source-observability`](references/resilience-per-source-observability.md) — Tag every downstream call with structured source/outcome metadata
- [`resilience-degrade-search-gracefully`](references/resilience-degrade-search-gracefully.md) — Tier 1 → tier 2 → tier 3 fallback for OpenSearch outages

### 7. Async & Concurrency (MEDIUM-HIGH)

- [`async-sync-to-async-orm`](references/async-sync-to-async-orm.md) — Use Django 4.1+ async ORM (`aget`, `afilter`) or `sync_to_async` with `thread_sensitive=True`
- [`async-worker-model-uvicorn-vs-gunicorn`](references/async-worker-model-uvicorn-vs-gunicorn.md) — Run ASGI (uvicorn or gunicorn+UvicornWorker) for true async concurrency
- [`async-fire-and-forget-with-create-task`](references/async-fire-and-forget-with-create-task.md) — `create_task` for analytics/audit; add error handler; hold task references
- [`async-context-vars-for-request-scope`](references/async-context-vars-for-request-scope.md) — `contextvars.ContextVar` for per-request state; not `threading.local`
- [`async-cancel-on-client-disconnect`](references/async-cancel-on-client-disconnect.md) — Check `await request.is_disconnected()`; propagate cancellation

### 8. API Response Design (MEDIUM)

- [`api-cursor-pagination-in-drf`](references/api-cursor-pagination-in-drf.md) — Cursor pagination over page-number; opaque base64 cursors
- [`api-serializer-perf-select-related`](references/api-serializer-perf-select-related.md) — `select_related`/`prefetch_related`/`only` to eliminate N+1
- [`api-etag-and-cache-control-headers`](references/api-etag-and-cache-control-headers.md) — `ETag` + `Cache-Control` + `Vary` for CDN/client reuse
- [`api-compression-and-payload-shaping`](references/api-compression-and-payload-shaping.md) — gzip/brotli; sparse fieldsets; msgpack for internal APIs
- [`api-throttle-per-user-and-endpoint`](references/api-throttle-per-user-and-endpoint.md) — DRF throttle classes per user/anon and per expensive endpoint

## How to Use

1. Open [references/_sections.md](references/_sections.md) for category definitions and impact rationale
2. Read individual rule files for incorrect-vs-correct code examples (each ~150-300 lines with Python code)
3. For ready-to-use scaffolds, see [scaffolding templates](assets/templates/)
4. The [AGENTS.md](AGENTS.md) navigation document (auto-generated) provides a TOC for browsing

## Scaffolding Templates

Five ready-to-adapt Python templates under `assets/templates/`:

| Template | Purpose |
|----------|---------|
| `fanout_recommender_service.py.template` | Async fan-out client to Personalize/Databricks/microservice with per-downstream circuit breaker, bounded timeout, partial-result return |
| `opensearch_search_view.py.template` | DRF view + OpenSearch `search_after` cursor + `function_score` blending + `_source` filtering |
| `result_blender.py.template` | Score normalization + MMR diversity + canonical-ID dedup + cold-start fallback |
| `redis_cache_with_stampede.py.template` | Stampede-safe cached function decorator with SETNX lock and jittered TTL |
| `degraded_response.py.template` | Partial-results envelope with per-source status flags + tier-based fallback |

## Reference Files

| File | Description |
|------|-------------|
| [references/_sections.md](references/_sections.md) | Category definitions, ordering, impact rationale, tier definitions |
| [assets/templates/_template.md](assets/templates/_template.md) | Template for authoring new rules |
| [metadata.json](metadata.json) | Version, references, abstract |

## Related Skills

- `react-fetch-cache-patterns` — Client-side peer covering React data fetching/caching (Suspense, query libraries, prefetch)
- `io-bound-data-processing` — Python async patterns for batch and pipeline workloads
- `inngest-nextjs-patterns` — Workflow patterns for server-side step functions