# AGENTS.md — Magic Indexer (single-source onboarding) > **If you are an AI assistant or a new contributor opening this > repo for the first time, this file is your complete onboarding. > Reading just this file should be enough to be useful. The other > docs (`docs/RUNBOOK.md`, `docs/reviews/`, `SECURITY.md`, > `README.md`) are deeper references; this file is the digest.** If, after reading this file, you are unsure about anything that isn't covered here, that is a documentation bug — flag it to the operator and they will tighten this file rather than re-explain. --- ## What this is **Magic Indexer** is an AT Protocol AppView server that ingests records from Jetstream + labels from ATProto labelers and exposes both via a dynamically-generated GraphQL API. It is the `hb-agent/magic-indexer` fork of the `hypercerts-org/hyperindex` project. The compiled binary inside the container is named `hypergoat`, and the Go module path is `github.com/GainForest/hypergoat`. Both names are historical artefacts from when the project was originally called Hypergoat. Every command in this repo that mentions `hypergoat` or `./cmd/hypergoat` is referring to that binary path — not a different product. **Do not rename the module or the binary**; it would touch ~80 files for a brand-only change. The product name in user-facing documentation, configuration, deployments, and conversation is **Magic Indexer**. --- ## Live deployment (the dev environment) | Item | Value | |--------------------------|------------------------------------------------------------------------| | Public URL | `https://magic-indexer-dev.up.railway.app` | | Public GraphQL | `https://magic-indexer-dev.up.railway.app/graphql` | | GraphQL subscriptions | `wss://magic-indexer-dev.up.railway.app/graphql/ws` | | Admin GraphQL | `https://magic-indexer-dev.up.railway.app/admin/graphql` | | GraphiQL playground | `https://magic-indexer-dev.up.railway.app/graphiql` | | GraphiQL admin | `https://magic-indexer-dev.up.railway.app/graphiql/admin` | | Health | `https://magic-indexer-dev.up.railway.app/health` | | Stats | `https://magic-indexer-dev.up.railway.app/stats` | | Prometheus metrics | `https://magic-indexer-dev.up.railway.app/metrics` | | Railway project ID | `7d6c4e52-de61-439f-96c0-3ded4114b9be` | | Railway project name | `magic-index` | | Railway environment | `dev` | | Railway service | `magic-indexer` | | Railway dashboard | `https://railway.com/project/7d6c4e52-de61-439f-96c0-3ded4114b9be` | | GitHub repo | `https://github.com/hb-agent/magic-indexer` | | Active branch | `per-labeler-definitions` | | Backing database | Postgres 18, Railway-managed, in the same project | | Admin UI | `https://magic-indexer-admin.vercel.app` (Next.js, confidential ATProto OAuth) | | Currently ingesting from | Jetstream (24 lexicon-derived collections, including `app.certified.temp.graph.endorsement`) | The 24 collections currently being ingested all start with one of three NSID prefixes: `org.hypercerts.*`, `app.certified.*`, `org.hyperboards.*`. The `app.certified.temp.graph.endorsement` lexicon supports the trusted-evaluator feed filter (see [`docs/architecture/0001-trusted-evaluator-feed-filter.md`](docs/architecture/0001-trusted-evaluator-feed-filter.md)). Lexicons are uploaded via the admin API from the npm package `@hypercerts-org/lexicon` (see Operations below). ### Full-text search All typed collection queries and the generic `records` query support a `search: String` parameter for full-text search across record content. The search uses Postgres `tsvector` with a GIN index for fast, stemmed queries. **Searched fields** (weighted): title (A), shortDescription (B), description (C), workScope (D, string variant only). **Behavior**: terms are space-separated and implicitly ANDed. English stemming is applied ("forest" matches "forests"). Special characters are stripped by `plainto_tsquery` — no injection risk. Max query length: 500 characters. **Example**: ```graphql { orgHypercertsClaimActivity(search: "forest conservation", first: 10, authors: ["did:plc:..."]) { edges { node { uri title shortDescription } } pageInfo { hasNextPage } } } ``` **Combinable with**: `authors`, `labels`, `excludeLabels`, `labelerDids`. ### Record validation Records are validated against their lexicon schemas at two points: - **Ingestion time** (Jetstream + backfill): controlled by `VALIDATION_MODE` env var (`disabled`/`warn`/`enforce`, default `disabled`). In `warn` mode, invalid records are logged but stored. In `enforce` mode, they are skipped. - **Query time** (always on): `SanitizeRecord()` filters out records missing required fields, truncates over-long strings, and nulls invalid optional fields. This prevents NonNull propagation from killing entire query responses. --- ## Safety rules — read these first These are non-negotiable. Apply before doing anything that touches state. 1. **Never commit secrets.** `SECRET_KEY_BASE`, `ADMIN_API_KEY`, the Railway API token, OAuth signing keys, and `.env` files stay out of git. The repo's `.gitignore` excludes `.env` and `.env.local`. `config.Validate()` at startup rejects the literal `development-secret-key-change-in-production-64chars` placeholder so a misconfigured deploy fails fast instead of booting with a public key. 2. **Never echo secrets in unredacted form** in chat output, log files, or anything that could be persisted. When the operator pastes a secret to you, store it in `/tmp/...` with `chmod 600` and reference the variable, don't reproduce the value. 3. **Confirm with the operator before destructive actions.** This includes (but isn't limited to): `railway down`, deleting Railway services, dropping the Postgres volume, force-pushing to `per-labeler-definitions`, deleting GitHub branches, mass issue closure, `git reset --hard`, `git rebase -i`, `gh repo delete`, dropping or truncating any database table, `railway redeploy` against a service whose latest commit you haven't built locally, rotating any secret without first confirming the operator has the new value, and any operation that affects the upstream `hypercerts-org/hyperindex` repo (this is a fork — your default scope is `hb-agent/magic-indexer`). 4. **Read-then-act, not act-then-explain.** When investigating a problem, read the relevant code with file:line evidence before proposing changes. Roughly half the "CRITICAL" findings in the 23 review rounds were false positives that disappeared after looking at the actual lines cited. 5. **Quality gates before commit.** No code change is "done" until all four pass: ```bash go build ./... go vet ./... go test -race ./... golangci-lint run ./... ``` CI also runs Postgres tests via `TEST_DATABASE_URL` and a reproducible-build diff job. Both should stay green. 6. **Commit message convention.** Each commit ends with a `Co-Authored-By:` trailer naming the model/agent that wrote it. The repo's recent history follows this; match the style. 7. **`git push` is not optional.** A change you didn't push is work that doesn't survive the session. The "Landing the plane" checklist at the bottom of this file is mandatory. --- ## Browser automation in this dev container This dev container has a working **agent-browser** install that controls a real headless Chromium. Use it when you need to verify something behaves correctly *in a real browser* — not just in SSR HTML, not just in a curl probe. Examples: client-side hydration errors, CORS rejections, React error boundaries, post-hydration data fetches, or "does the user actually see X on the page". The two pieces that had to come together for this to work: - **`agent-browser` CLI** (npm package, native Rust): installed globally via `npm install -g agent-browser`. Version `0.25.3` or later. - **Chromium binary**: this dev container is Linux ARM64. Chrome for Testing has no ARM64 builds, so we use the Chromium that ships with Playwright instead. Install with `npx --yes playwright@latest install chromium --with-deps`. Lands at `~/.cache/ms-playwright/chromium-1217/chrome-linux/chrome`. - **Wrapper script** at `~/.local/bin/ab` that always passes `--executable-path` pointing at the Playwright Chromium so you never have to remember it. **Use `ab` instead of `agent-browser` for everything in this repo.** If `ab --version` doesn't work in a fresh session, both the npm install and the Playwright Chromium download will need to be re-run, then drop the wrapper back in: ```bash npm install -g agent-browser npx --yes playwright@latest install chromium --with-deps mkdir -p ~/.local/bin cat > ~/.local/bin/ab <<'EOF' #!/usr/bin/env bash exec agent-browser --executable-path "$HOME/.cache/ms-playwright/chromium-1217/chrome-linux/chrome" "$@" EOF chmod +x ~/.local/bin/ab ``` The chromium directory name (`chromium-1217`) is the Playwright revision number and may differ on a fresh install. Update the wrapper if needed. ### Common usage ```bash ab open https://magic-indexer-dev.up.railway.app/graphiql ab snapshot # accessibility tree with refs (best for AI) ab screenshot /tmp/page.png # raster image ab eval '' # run JS in the page context ab click @e10 # click element by ref from snapshot ab fill @e3 "search term" # fill an input ab close # close session ``` ### What it caught last session The integration test of `certs-social → magic-indexer` produced the right SSR HTML, the right Vercel build, the right TypeScript, and a passing `npm run build` — but the live page in a real browser showed `Something went wrong / Failed to fetch` because the magic-indexer CORS allowlist didn't include the Vercel preview URL. Caught only because `ab open` + `ab snapshot` exposed the post-hydration error state. None of the static checks would have found it. ### What it can't do `ab` controls a headless browser. It can't: - Watch you click around interactively (use your own browser). - Step through React DevTools. - Show you the same console output you'd see in Chrome DevTools in detail (use `ab eval 'console errors are evaluated as JS'` workarounds, or open the page in your own browser for interactive debugging). For deep interactive debugging, your local browser is still the right tool. `ab` is for "verify the live deployment renders correctly without me having to open a browser tab." ## Self-test for a fresh session After you've read this file, test your understanding by mentally answering these. If you can answer all of them without re-reading, you're oriented: 1. What is the project called, and what does it do in one sentence? 2. Where is it deployed, on what platform, with what backing store? 3. What's currently in the database (records, actors, lexicons, labelers)? 4. What's the active branch and the last commit on it? 5. Which two issues are deliberately deferred and why? 6. What two facts about lexicons do you have to remember when uploading them? 7. What is the *one* command an operator runs to deploy a code change? 8. What's the difference between `RAILWAY_TOKEN` and `RAILWAY_API_TOKEN`? 9. What's the rule about the `VOLUME` keyword in `Dockerfile`? 10. Why is `OptionalAuth` middleware permissive on bad bearer tokens, and why is that not a security hole? Answers are scattered through the rest of this file. If any of the questions don't have a clear answer here, that's a documentation bug — say so. --- ## Build, test, lint (local development) From a clean checkout: ```bash git clone https://github.com/hb-agent/magic-indexer.git cd magic-indexer git checkout per-labeler-definitions make setup # generates .env with a fresh SECRET_KEY_BASE go run ./cmd/hypergoat ``` Quality gates that must pass before any commit: ```bash go build ./... # also: make build go vet ./... go test ./... # also: make test (adds -race) go test -race ./... golangci-lint run ./... # also: make lint ``` Single test patterns: ```bash go test -v -run TestParseLexicon ./internal/lexicon/... go test -v ./internal/graphql/admin/... ``` Coverage report: ```bash make test-coverage # writes coverage.html ``` To run the integration test suite (build tag `integration`): ```bash go test -tags=integration ./internal/integration/... ``` CI runs all of the above on every push to `main` and every PR targeting `main`, against both SQLite and Postgres (`TEST_DATABASE_URL`), plus a reproducible-build diff job. See `.github/workflows/ci.yml`. --- ## Code style ### Imports Three groups, blank lines between them, in this order: ```go import ( "context" // 1. Standard library "fmt" "github.com/go-chi/chi/v5" // 2. External packages "github.com/GainForest/hypergoat/internal/database" // 3. Internal packages ) ``` ### Package documentation Every package has a doc comment on `package`: ```go // Package config handles configuration loading from environment variables. package config ``` ### Naming - Packages: lowercase, single word (`lexicon`, `oauth`, `backfill`). - Files: lowercase with underscores (`did_resolver.go`). - Types: PascalCase (`Executor`, `RecordFetcher`). - Interfaces: noun or `-er` suffix (`Executor`, `Fetcher`). - Acronyms: all caps (`URI`, `DID`, `HTTP`, `JSON`). ### Errors Always wrap with context, prefer `%w`: ```go if err != nil { return fmt.Errorf("failed to query records: %w", err) } ``` For OAuth-style validation errors prefer package-level sentinel vars (see `internal/oauth/dpop.go` for the canonical pattern that came out of review Round 8). ### Context Always pass `ctx` as the first parameter to any I/O method: ```go func (r *RecordsRepository) GetByURI(ctx context.Context, uri string) (*Record, error) ``` ### Repository pattern Database access lives in `internal/database/repositories/`. Constructors take a `database.Executor`. SQL is built with `r.db.Placeholder(n)` for dialect-aware parameters. Every method takes `ctx`. Don't reach into the executor from outside the repositories layer. ### Logging Use `log/slog` everywhere. Structured fields, never string interpolation: ```go slog.Info("Starting backfill", "collections", collections, "count", len(repos)) slog.Warn("Failed to resolve DID", "did", did, "error", err) slog.Error("Database connection failed", "error", err) ``` The mutation log line in the admin handler logs **variable keys, not values** — never reintroduce value logging without an audit, it's a log-injection vector that Round 3 caught and fixed. ### Testing Table-driven tests, fresh setup per test, no shared state: ```go func TestSomething(t *testing.T) { tests := []struct { name string input string wantErr bool }{ {"happy path", "valid", false}, {"bad input", "junk", true}, } for _, tc := range tests { t.Run(tc.name, func(t *testing.T) { _, err := DoThing(tc.input) if (err != nil) != tc.wantErr { t.Errorf("err = %v, wantErr %v", err, tc.wantErr) } }) } } ``` For DB tests, use `testutil.SetupTestDB(t)` which honours `TEST_DATABASE_URL` and falls back to in-memory SQLite. --- ## Project structure ``` cmd/hypergoat/ # Main entry point (server init, routing, lifecycle) internal/ backfill/ # Historical record backfill from AT Protocol relays config/ # Configuration loading from environment + Validate() consumer/ # Shared reconnection backoff (RunWithReconnect) used by jetstream, labeler, tap cursor/ # Shared cursor persistence (atomic.Int64 Flusher) used by all consumers database/ migrations/ # SQL migrations (auto-run on startup, transactional + "-- no-transaction" sentinel for CONCURRENTLY) repositories/ # Data access layer (records, labels, label_definitions, filter.go for field filters, etc.) sqlite/ # SQLite implementation (pure Go, modernc) — note: removed from production, Postgres-only postgres/ # PostgreSQL implementation (pgx) graphql/ admin/ # Admin API (POST-only, bearer-or-OAuth gated) — createFieldIndex, dropFieldIndex, uploadLexicons, etc. schema/ # Public schema builder (dynamic from lexicons) + where.go for field filter extraction resolver/ # Public resolver wiring + repository context injection query/ # Connection types (Relay spec) + ClampPageSize + SortDirectionEnum depth/ # Pre-execution GraphQL query depth guard (max 15 / 20) subscription/ # WebSocket subscriptions (graphql-transport-ws) types/ # GraphQL type mapping from lexicon definitions + filters.go (per-type FilterInput types) ingestion/ # Shared RecordProcessor: ensure-actor → insert/delete → log-activity → publish-to-pubsub integration/ # Integration tests (build tag: integration) jetstream/ # Real-time AT Protocol event consumer (delegates to ingestion.RecordProcessor) labeler/ # ATProto labeler subscribeLabels + queryLabels client lexicon/ # Lexicon parsing, registry, NSID utilities metrics/ # Prometheus counters + /metrics HTTP handler notifications/ # Bluesky-pattern notifications: per-collection extractors, aggregation, seen watermark notifications/extractors/ # Per-collection notifier implementations (endorsement, activity-contributor) oauth/ # OAuth 2.0 + DPoP + PKCE + did:plc / did:web resolution server/ # HTTP handlers, security headers, CORS, GraphiQL UI tap/ # Tap sidecar consumer (crypto-verified events, ack-based delivery) — alternative to Jetstream via TAP_ENABLED workers/ # Background jobs (activity cleanup + orphan janitor, etc.) docs/ # RUNBOOK + reviews + plans scripts/ # Deployment helpers (setup-env.sh) testdata/ # Test fixtures and sample lexicons ``` --- ## Subsystem highlights (the things that bit us in review) ### Public GraphQL `authors` filter Typed collection queries accept an `authors: [String!]` argument to filter by author DID. Cap: 500 DIDs per query. An empty list means "no filter" (returns all authors). Example: `orgHypercertsClaimActivity(first: 10, authors: ["did:plc:..."])`. ### Field filter system (`internal/database/repositories/filter.go` + `internal/graphql/schema/where.go`) Typed collection queries also accept a `where` argument with per-field operators generated from the lexicon's scalar properties: - Operators: `eq`, `neq`, `gt`, `lt`, `gte`, `lte`, `in`, `contains`, `startsWith`, `isNull`. - `eq` uses `json @> $::jsonb` containment (hits the GIN `jsonb_path_ops` index). Other operators use `json->>'field'` extraction (seq scan unless an expression index exists — see admin mutations below). - `neq` semantically means "not equal OR field absent" (includes NULLs). - `contains` min 3 chars; `startsWith` min 1 char. Both escape `\`, `%`, `_` via `ESCAPE '\\'`. - `in` uses `= ANY($::text[])` — single array param instead of expanded `IN (...)`. - Nested paths via `__` separator (e.g., `metadata__source` → `json->'metadata'->>'source'`). Max 3 nesting levels. Auto-generating nested WhereInput fields from lexicons is deferred (issue #40); SQL layer supports it. Composition via `_and` / `_or` fields on WhereInput (recursive, self-referential via `graphql-go` `AddFieldConfig`): - `FilterGroup` tree with `GroupAND`/`GroupOR` operators. - `BuildFilterGroupClause` is the recursive SQL builder; proper parenthesization; global condition count capped at `MaxFilterConditions` (20) across the whole tree; max depth `MaxFilterDepth` (3). - Field name validation (`[a-zA-Z_][a-zA-Z0-9_]*` per segment) runs before any string interpolation into SQL — this is defense-in-depth; names come from the lexicon registry, not user input. ### Sort-aware keyset pagination `orderBy` (string, field name) and `orderDirection` (ASC/DESC, default DESC) arguments on typed collection queries. The repository layer now honors these: the `ORDER BY` clause uses `SortOption.BuildSortExpr()` and the keyset cursor comparison uses the sort expression (previously always `indexed_at`). - Direct columns (`indexed_at`, `uri`, `did`, `collection`, `cid`, `rkey`) use the column name; anything else becomes `json->>'field'` (with nested path support via `__`). - `NULLS LAST` in ORDER BY for both ASC and DESC. - URI tiebreaker appended in the same direction. - **Fast-path guard**: when no filters/labels/search apply, the function delegates to `GetByCollectionWithKeysetCursor` which always sorts by `indexed_at DESC`. The `hasCustomSort` check (PR #50) prevents that path from silently ignoring a custom `orderBy` on an unfiltered query. - Multi-column sort (orderBy as list) is deferred (issue #39). ### Cursor format (V2) Cursors are base64-URL-encoded JSON arrays: `["sortField", "sortValue", "uri"]`. The decoder also accepts the legacy pipe-delimited format (`"timestamp|uri"`) for backward compatibility; legacy cursors only work when `orderBy` is `indexed_at` (default). Sort-field mismatch produces a clear error. ### Backward pagination `last` + `before` arguments complement `first` + `after`. Mixed forward + backward is rejected. Implementation: flip the sort direction + cursor comparison, fetch `last+1`, reverse the slice in memory. `hasPreviousPage` is true when we fetched more than `last`; `hasNextPage` for backward mode reflects whether items exist after the returned window. ### Admin expression index mutations `createFieldIndex(collection, field)` and `dropFieldIndex(collection, field)` on the admin GraphQL API. Generates: `CREATE INDEX CONCURRENTLY ON record ((json->>'field')) WHERE collection = 'nsid'`. Partial index (filtered by collection) keeps size small. Runs outside a transaction via the migration runner's `-- no-transaction` sentinel convention. Use this to accelerate comparison/pattern filters that the GIN index can't serve. ### Shared consumer infrastructure (`internal/ingestion`, `internal/cursor`, `internal/consumer`) Extracted from the original inline Jetstream consumer during the hyperindex port: - `ingestion.RecordProcessor` — ensure-actor → insert/delete → log-activity → publish-to-pubsub. Used by both Jetstream and Tap consumers. Enforces an optional collection allowlist and rejects non-object JSON records. - `cursor.Flusher` — `atomic.Int64` cursor value + ticker-based flush, skip-on-idle. Survives context cancellation via a bounded final flush. - `consumer.RunWithReconnect` — exponential backoff (1s → 2min, reset after 30s of stable connection). ### Tap consumer (`internal/tap/`) Alternative to Jetstream when `TAP_ENABLED=true`. Consumes crypto-verified events from the Bluesky Tap sidecar with ack-based delivery and per-repo ordering. Synchronous dispatch (backpressure via the WebSocket itself is the correct signal for ack-based protocols). Panic-recovered, exponential retry (1s/2s/4s) per event, then skip. `Connection` / `Dialer` interfaces abstract gorilla/websocket for testability. Trust boundary: Tap verifies MST inclusion proofs but not signing key vs DID document (#41 deferred). ### Notifications subsystem (`internal/notifications/`) Bluesky-pattern notification system. Enabled via `NOTIFICATIONS_ENABLED=true`. - **Data model**: `notification` (envelope, one row per displayed notification, optionally aggregated by `group_key`), `notification_participant` (one row per source record that contributed — unique on `(record_uri, recipient_did)` for idempotent replay and correct tombstone cascade), `actor_state` (per-user seen watermark, same as Bluesky). - **Hook**: registered as a `RecordHook` on `RecordProcessor.RecordHooks`, policy `HookLogContinue`. A malformed record cannot stall firehose ingestion — hook errors are logged but don't abort the record insert. Panic-recovered per invocation. Runs on insert/update/delete. - **Extractors** (`internal/notifications/extractors/`): one Go file per collection. Currently: `endorsement` (aggregates on subject URI) and `activity-contributor` (non-aggregating, fans out per contributor DID up to `MaxFanOutPerRecord=100`). - **Idempotency**: the participant table's UNIQUE `(record_uri, recipient_did)` is the replay boundary. Re-processing the same record is a no-op. - **Tombstone cascade**: record delete → `DeleteByRecordURI` removes participants, decrements envelope count, deletes the envelope at count 0, recomputes `latest_*` from remaining participants when the removed participant was the latest. - **Update path**: delete-then-re-extract, to handle activity contributor list changes correctly. - **Defense-in-depth**: `isValidDID` syntactic validation, `clampSortAt` bounds timestamps to `[now-7d, now]`, `MaxReasonSubjectBytes` caps subject URIs, `MaxContributorsBeforeReject` short-circuits oversized records via a shallow JSON scan before full unmarshal. - **GraphQL (admin endpoint)**: `notifications(did, reasons, first, after)`, `unreadNotificationCount(did)` (capped at 50+), `updateNotificationsSeen(did, seenAt)`. Fields are merged into the admin schema via `admin.WithExtraQueries` and `admin.WithExtraMutations` options — no cyclic import between packages. - **Cursor V1** for notifications: base64-URL JSON `["v1:notif", sort_at_iso, id]`. - **Trust boundary**: public `/graphql` is unauthenticated, so notifications live on the admin endpoint and accept `did` as an argument. The certs-social proxy is the trust boundary (resolves session DID and forwards it). Public- endpoint migration is deferred until OAuth auth lands on `/graphql`. ### Labeler subsystem (`internal/labeler/`) Mirrors `internal/jetstream/` but speaks the ATProto labeler protocol: - `client.go` — websocket client for `com.atproto.label.subscribeLabels`. Uses `fxamacker/cbor/v2` for the two-CBOR-object frame format (`#labels`, `#info`, `#error`). `SetReadLimit` bounds frame size. Non-normal close codes are surfaced at Warn; empty-body `#labels` frames are dropped explicitly; `#info` decode failures are elevated to Warn so `OutdatedCursor` signals cannot be silently lost. - `backfill.go` — one-time `com.atproto.label.queryLabels` paginated backfill via `hashicorp/go-retryablehttp`. - `consumer.go` — lifecycle: load cursor → backfill if needed → connect → stream labels → flush cursor on a ticker. Exponential backoff on reconnect. Panic-recovered at the goroutine boundary in `cmd/hypergoat/main.go` so one labeler cannot take down the process. Logs cursor gaps at Warn. Label definitions are auto-upserted via `INSERT ... ON CONFLICT DO NOTHING` keyed on the composite `(src, val)` PK from migration 009 — concurrent labelers cannot race a new `(src, val)` pair. ### Jetstream consumer (`internal/jetstream/`) - `client.go` — websocket client for the Jetstream firehose. `SetReadLimit(8 MiB)` bounds per-frame memory. - `consumer.go` — lifecycle + reconnect loop. The cursor is persisted to the `config` table every 5 s by default. Critical invariant from Round 14: `c.cursorDone`, `c.config`, and `c.ctxCancel` must all be mutated under `clientMu`; the `Start()` reconnect loop and `UpdateCollections()` both take the lock around their state writes. - The lexicon change callback dynamically restarts the Jetstream consumer with a fresh `wantedCollections` list whenever lexicons are uploaded via the admin API. No process restart needed. ### Security headers middleware (`internal/server/security_headers.go`) Emits `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: no-referrer`, and conditionally `Strict-Transport-Security` (only when `EXTERNAL_BASE_URL` is `https://`). `/graphiql` sets its own `Content-Security-Policy` allowing the unpkg CDN for bootstrap assets; JSON API endpoints keep the tighter default. ### `OptionalAuth` middleware (`internal/oauth/middleware.go`) **Important contract**: when the `Authorization` header is present but fails OAuth token validation, `OptionalAuth` passes through with no user context — it does **not** return 401. This is required because `/admin/graphql` is mounted with `OptionalAuth` and the admin handler accepts two auth schemes: a validated OAuth user, and an `ADMIN_API_KEY` bearer token. Returning 401 in the middleware on a non-OAuth bearer (like the admin API key) would prevent the API-key path from ever being reached. The admin handler does its own 401 check on empty userDID + no API key, so security posture is unchanged. Round 8 introduced this behaviour live during the first deploy when admin auth via `ADMIN_API_KEY` returned `invalid_token`. ### Activity log empty-event-json normalisation (`internal/database/repositories/jetstream_activity.go`) The Jetstream consumer passes `string(commit.Record)` into `LogActivity`. For delete operations `commit.Record` is nil and the result is `""`. Postgres `JSONB NOT NULL` rejects empty strings; SQLite stores them loosely as TEXT. The repository normalises empty / whitespace-only payloads to the JSON literal `null` so both dialects accept the row. Discovered live during the Railway deploy. ### `config.Validate()` (`internal/config/config.go`) - Refuses to start if `SECRET_KEY_BASE` is shorter than 64 bytes or matches the literal `development-secret-key-change-in-production-64chars` placeholder. - Refuses to start on out-of-range `PORT`. - Logs a Warn (not silent fallback) when `getEnvInt` is given a malformed integer value. ### Migrations (`internal/database/migrations/`) - Each migration's `UpSQL` and the `schema_migrations` insert run inside a single transaction (`applyMigrationTx`). A crash in the middle leaves both rolled back. - `Rollback` follows the same pattern. - Migrations 001–009 are present in both `sqlite/` and `postgres/` variants and are tested for round-trip equivalence. ### Repositories that touch labels - `LabelsRepository` — `Insert` / `InsertNegation` use `ON CONFLICT DO NOTHING` keyed on a partial unique index per migration 007. Active-set queries (`GetByURIs`, `HasTakedown`, `GetTakedownURIs`, plus the records label-filter subquery) all filter expired labels via `(l.exp IS NULL OR l.exp > nowLiteral())`. - `LabelDefinitionsRepository` — composite `(src, val)` primary key from migration 009 so two labelers can both define `high-quality` with different semantics. - `OAuthDPoPJTIRepository.InsertIfNew` — atomic `INSERT ... ON CONFLICT DO NOTHING` for race-safe DPoP replay detection. --- ## Deploying — the short version The full deploy playbook (first-time provisioning, lexicon upload, secret rotation, common gotchas) is in [`docs/RUNBOOK.md`](docs/RUNBOOK.md). Read that **before** touching the live environment for anything beyond a routine code redeploy. ### Routine code deploy ```bash cd /path/to/magic-indexer git checkout per-labeler-definitions git pull export RAILWAY_API_TOKEN='' railway up --service magic-indexer --detach railway logs --service magic-indexer --deployment --lines 100 ``` ### Watch a deploy ```bash railway logs --service magic-indexer --build # build phase railway logs --service magic-indexer --deployment # runtime ``` ### Railway gotchas (the things that broke our first deploys) - **`VOLUME` is banned in Dockerfiles.** Railway rejects any `VOLUME` instruction. Don't reintroduce it. Use Railway's native volume mechanism via the dashboard if you need persistent storage. - **Use `RAILWAY_API_TOKEN` for account-scoped tokens, not `RAILWAY_TOKEN`.** `RAILWAY_TOKEN` is for project-scoped tokens. Whoami fails silently with the wrong variable. - **`railway add --database postgres` shows a prompt that looks like a hang but the service is created anyway.** Don't double- run; you'll get duplicate Postgres services. If you do, delete the duplicate via the GraphQL API: `mutation { serviceDelete(id: "") }`. - **`railway variables --set NAME=` (empty value) is rejected by the CLI.** To clear a variable, use the GraphQL API: `mutation { variableUpsert(input: { ..., value: "" }) }`. - **HSTS only emits when `EXTERNAL_BASE_URL` starts with `https://`**. A deployed instance with `http://` in that env var will not send HSTS. By design. - **Railway auto-discovers + exposes `${{Postgres.DATABASE_URL}}` variable references** at runtime; use that form, not the raw resolved URL, so a Postgres credential rotation propagates automatically. --- ## Operations — the short version The full operator playbook is in [`docs/RUNBOOK.md`](docs/RUNBOOK.md). Two essential rules to remember: ### Lexicons come from npm, never from main The canonical source for hypercerts/certified/hyperboards lexicons is the npm package `@hypercerts-org/lexicon`. **Do not read from the upstream `hypercerts-org/hypercerts-lexicon` main branch directly.** The README of that repo says so explicitly, and there's a good reason: main is unstable and contains work-in-progress schema changes that may be broken or incompatible. The npm package is the versioned, tested distribution. To upload lexicons matching a set of NSID prefixes: ```bash # 1. Resolve latest version VERSION=$(curl -s https://registry.npmjs.org/@hypercerts-org/lexicon \ | python3 -c "import json,sys; print(json.load(sys.stdin)['dist-tags']['latest'])") # 2. Download the tarball curl -sL "https://registry.npmjs.org/@hypercerts-org/lexicon/-/lexicon-$VERSION.tgz" \ -o /tmp/lexicon.tgz mkdir -p /tmp/lexicon-pkg && tar -xzf /tmp/lexicon.tgz -C /tmp/lexicon-pkg # 3. Filter to the prefixes you want cd /tmp/lexicon-pkg mkdir -p upload-staging find package/lexicons -name "*.json" | while read f; do id=$(python3 -c "import json; print(json.load(open('$f'))['id'])") case "$id" in org.hypercerts.*|app.certified.*|org.hyperboards.*) rel=${f#package/lexicons/} mkdir -p "upload-staging/$(dirname "$rel")" cp "$f" "upload-staging/$rel" ;; esac done # 4. Zip + base64 ( cd upload-staging && zip -r ../lexicons.zip . ) base64 -w0 lexicons.zip > lexicons.zip.b64 # 5. Upload via admin GraphQL ADMIN_API_KEY='' ADMIN_DID='did:plc:' python3 -c " import json print(json.dumps({ 'query': 'mutation Upload(\$zip: String!) { uploadLexicons(zipBase64: \$zip) }', 'variables': {'zip': open('lexicons.zip.b64').read().strip()} }))" > upload-payload.json curl -X POST https://magic-indexer-dev.up.railway.app/admin/graphql \ -H "Authorization: Bearer $ADMIN_API_KEY" \ -H "X-User-DID: $ADMIN_DID" \ -H "Content-Type: application/json" \ --data-binary @upload-payload.json # expected: {"data":{"uploadLexicons":}} ``` After upload, the Jetstream consumer **automatically restarts** with the new union of `wantedCollections`. No human action needed. ### Labeler enable / disable / pause ```bash # Enable: comma-separated DIDs railway variables --service magic-indexer \ --set "LABELER_DIDS=did:plc:abc...,did:plc:def..." # Disable all: empty string via GraphQL (CLI doesn't allow empty values) curl -X POST https://backboard.railway.com/graphql/v2 \ -H "Authorization: Bearer $RAILWAY_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{"query":"mutation { variableUpsert(input: { projectId: \"7d6c4e52-de61-439f-96c0-3ded4114b9be\", environmentId: \"\", serviceId: \"\", name: \"LABELER_DIDS\", value: \"\" }) }"}' railway redeploy --service magic-indexer --yes # Pause one labeler without restart (admin endpoint) curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \ "https://magic-indexer-dev.up.railway.app/admin/labeler/pause?did=did:plc:..." # Reset cursor (force re-backfill on next start) curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \ "https://magic-indexer-dev.up.railway.app/admin/labeler/reset?did=did:plc:..." ``` ### Diagnose "why is this record hidden?" ```bash curl -H "Authorization: Bearer $ADMIN_API_KEY" \ "https://magic-indexer-dev.up.railway.app/admin/label-chain?uri=at://did:plc:abc/app.bsky.feed.post/xyz" ``` Returns every label on the URI (active, negated, expired) with provenance. Bypasses the public query path's filters because this is a diagnostic view. ### Common labeler failure modes | Symptom | Likely cause | |------------------------------------------------------|---------------------------------------------------------------------------------------------------------------| | `dial labeler: connection refused` | Labeler's PLC entry points at a non-public host (e.g. `http://localhost:4100`). Operator must update DID doc. | | `dial labeler: websocket: bad handshake` | Labeler host serves queryLabels HTTP but not subscribeLabels WS. Backfill works, live stream doesn't. | | `Labeler backfill complete ... received=0` | No labels published under this DID's `src`. Either new labeler with no data, or wrong DID. | | Cursor gap warning in logs | Labeler dropped frames upstream. Indexer keeps going. | The reconnect loop uses exponential backoff (1 s → 2 min cap), so a permanently-broken labeler settles to one log line per two minutes per labeler. --- ## Review history (so you don't waste a session) This branch went through **23 rounds of overnight review** producing 59 fixes and 3 regression tests before the first deploy. The per-round logs and final reports are in [`docs/reviews/`](docs/reviews/). Read the index there if you suspect something has already been audited. A comprehensive security audit was performed on **2026-04-13** (see [`docs/AUDIT_REPORT_2026-04-13.md`](docs/AUDIT_REPORT_2026-04-13.md)). It identified 29 findings (4 Critical, 5 High, 8 Medium) and fixed 15 of them across 14 commits. The remaining items are low-severity or require architectural changes. Combined totals: | Rounds | Reviewers | Critical | Major | Minor | Nice | Fixed | |--------|-----------|----------|-------|-------|------|-------| | 1–10 | 200 | 35 | 100 | 95 | 19 | 55 fixes + 3 regression tests | | 11–18 | 160 | 2 | 1 | 0 | 0 | 3 fixes (jetstream state races, Round 14) | | 19–23 | 100 | 0 | 0 | 0 | 0 | 1 mid-deploy fix (`OptionalAuth` pass-through) | | Audit | 10+ | 4 | 5 | 8 | 12 | 15 fixes across 14 commits | | **total** | **470+** | **41** | **106** | **103** | **31** | **74 fixes + 3 regression tests** | ### Items deliberately deferred (do not re-discover) Two open issues are **deliberate** deferrals. Both have full rationale + design questions documented as comments on the GitHub issue. Read the comments before proposing work in either area. - **[#10 — Labeler signature verification](https://github.com/hb-agent/magic-indexer/issues/10)**. Re-open when a labeler we ingest starts shipping cryptographic signatures against a stable scheme. - **[#13 — GDPR hard-delete endpoint](https://github.com/hb-agent/magic-indexer/issues/13)**. Re-open when there's a real erasure request or a legal obligation. - **#57 Deferred hardening (service-auth for notifications).** The April-2026 PR landed the verifier, `/notifications/graphql` endpoint, and `.well-known/atproto-did` handler. What it did NOT land: per-`iss`/IP resolver throttle, negative cache, serve-stale on PLC outage, bad-signature key-rotation retry, `caller_hash` metric label, persistent jti store. Sentinels and metric helpers are already in place; wire them when (a) we see real abuse, (b) we add a second replica, or (c) the plan file (`/workspace/issue-57-plan.md`) says we're ready. - **#26 Deploy 2 (sortAt exposure).** The April-2026 bundled PR shipped Deploy 1: migration 017 + `ingestion.ComputeSortAt` writes `sort_at` on every new insert. Deploy 2 — backfill existing rows, `NOT NULL` flip, `sortAt` GraphQL field, and `ORDER BY COALESCE(sort_at, indexed_at)` queries — stays deferred until Deploy 1 has been live long enough that the NULL tail is small. Check `SELECT count(*) FROM record WHERE sort_at IS NULL` before scheduling it. Other things that came up in review and were intentionally **not** changed: - The Go module path stays `github.com/GainForest/hypergoat` and the binary stays `hypergoat`. Renaming would touch ~80 files for a brand-only change. The product is "Magic Indexer" in docs, the binary is `hypergoat` on disk. - **Takedown is opt-in.** A record with an active `!takedown` label is *not* hidden by default. Clients must pass `excludeLabels: ["!takedown"]` explicitly. This is a deliberate product decision (the indexer is labeler-neutral). --- ## Landing the plane (mandatory checklist when ending a session) 1. **Quality gates** (if code changed): ```bash go build ./... go vet ./... go test -race ./... golangci-lint run ./... ``` All four green or you have a real reason for the failure that's documented in the commit message. 2. **Commit with the right convention**: ```bash git add -A git commit -m ": Co-Authored-By: " ``` Refer to closed issues with `Closes #N` where applicable. 3. **Push**: ```bash git push origin per-labeler-definitions git status # MUST show "up to date with origin" ``` 4. **Verify**: nothing that affects the live deployment is considered "shipped" until `railway up && /health → 200` and the relevant log line you expected is visible in `railway logs --service magic-indexer --deployment`. 5. **Don't leave secrets in `/tmp`** if the session is ending without recovery. `shred -u /tmp/` or leave them in place if the operator will be back to use them. **Never stop before pushing.** Local-only work is work that doesn't survive. --- ## See also (for the things this digest abbreviates) - [`docs/RUNBOOK.md`](docs/RUNBOOK.md) — full operator playbook with first-time deploy, lexicon walkthrough, secret rotation, incident response, every gotcha worked out long-form. - [`docs/reviews/README.md`](docs/reviews/README.md) — index of the 23-round overnight review history. - [`SECURITY.md`](SECURITY.md) — required env vars, reverse-proxy rate limits, admin auth contract. - [`README.md`](README.md) — high-level project intro and live URL. - [`scripts/setup-env.sh`](scripts/setup-env.sh) — what `make setup` actually runs.