# Geo deep-dive — SRIDs, distance models, dialect behaviour The [Geo chapter](../README.md#geo-geopoint-near-nearto-withinpolygon) in the README covers the surface API — `f.geoPoint()`, the `near` filter, the `nearTo` orderBy, `withinPolygon` containment. This doc is the companion that goes one layer down: which SRID to pick for which workload, how each dialect actually implements the IR, when fallback mode is acceptable, how the Mongo cross-field rewrite works, error budgets per distance model, MultiPolygon hole semantics, GeoJSON plumbing, spatial joins, H3 clustering, realtime tracking patterns, and how to test geo code without flaky lat/lng comparisons. The [2.5.0 CHANGELOG entry](../CHANGELOG.md#250--mssql-merge-upsert-mongo-cross-field-nearto-browser-doctordiff-multipolygon--geometrycollection-3d--z-coordinates-non-wgs84-srids) documents the underlying additions — MultiPolygon + GeometryCollection inputs, 3D / Z coordinates, non-WGS84 SRIDs, Mongo cross-field `nearTo`. Everything below assumes you're on 2.5.x. ## Contents * [Pick an SRID](#pick-an-srid) * [Dialect feature matrix](#dialect-feature-matrix) * [Fallback mode in detail](#fallback-mode-in-detail) * [PostGIS deep-dive](#postgis-deep-dive) * [Mongo 2dsphere and the $geoNear rewrite](#mongo-2dsphere-and-the-geonear-rewrite) * [Distance modes — sphere, ellipsoid, planar](#distance-modes--sphere-ellipsoid-planar) * [3D coordinates](#3d-coordinates) * [MultiPolygon + GeometryCollection patterns](#multipolygon--geometrycollection-patterns) * [GeoJSON pipeline](#geojson-pipeline) * [Spatial joins](#spatial-joins) * [Heatmaps and H3](#heatmaps-and-h3) * [Realtime tracking](#realtime-tracking) * [Testing geo](#testing-geo) --- ## Pick an SRID Forge defaults to **SRID 4326** (WGS84 lat/lng). That's the right default for app code that takes coordinates from a phone GPS, a geocoder, or a `navigator.geolocation` call — every consumer-grade source emits WGS84. Pick a different SRID with `f.geoPoint({ srid: 3857 })`. Coordinates passed to `create` / `update` / `near` / `withinPolygon` must already be in the target SRID's units — forge does not auto-transform. | SRID | Name | Units | When it wins | |---|---|---|---| | 4326 | WGS84 | degrees | Default. Anything taking GPS / GeoJSON input. Sphere/ellipsoid distance via the dialect's geography type. | | 3857 | Web Mercator | metres (planar at the equator) | Tile maps (Google, Mapbox, Leaflet). Renders pixel-perfect because the projection matches the tile pyramid. Distance is heavily distorted near the poles — don't measure with it, just render. | | 27700 | OSGB 36 | metres | Surveying / cadastral data in Great Britain. Sub-metre accuracy for UK national datasets (OS MasterMap, postcode polygons from ONS). | | 2154 | Lambert-93 | metres | Mainland France. Used by IGN, BANO, INSEE. The legal projection for French public data. | | 32601-32660 | UTM north zones | metres | On-device proximity inside a single zone. Treat coordinates as planar within ~500 km of the central meridian; distortion < 1 in 2500. | | 32701-32760 | UTM south zones | metres | Same, southern hemisphere. | | 4269 | NAD83 | degrees | US federal datasets (Census TIGER, USGS). Visually identical to 4326 at app zoom; algebraically distinct. | Workload picks: - **Route planning** — store 4326 (geography), measure with the ellipsoid (PG `geography` does this automatically for distances under a few hundred km). Bigger trips: switch to a real routing engine (OSRM, Valhalla) and use forge only for the start/end points. - **Isochrones** — same as routing. Compute the polygon in OSRM / GraphHopper, store the result polygon in forge as `f.geoPoint()` for the centre and a `withinPolygon`-style boundary in a JSON column. - **Choropleth tiles** — store geometry in 3857. Tile-level rendering is then a passthrough. - **Surveying / cadastral** — store in the local national grid (27700 / 2154 / SP coords). Convert to 4326 only at the API boundary if a consumer asks for GeoJSON. - **On-device proximity** under ~50 km — pick the right UTM zone for the device's region and store planar. `near` filters become a cheap bbox + distance check with no trigonometry. Cross-SRID equality is undefined behaviour — every dialect rejects it at the SQL layer. If you mix SRIDs, normalise at write time with `proj4` or the dialect's `ST_Transform`. --- ## Dialect feature matrix What each dialect actually does for geo. Read in parallel with the [per-dialect emit table](../README.md#what-forge-emits-per-dialect) in the README. | | Types | Spatial index | Distance | Polygon | MultiPolygon | 3D Z | Cartesian + ellipsoidal | |---|---|---|---|---|---|---|---| | PostgreSQL + PostGIS | `geography(Point, srid)` (4326), `geometry(Point, srid)` (other) | GiST, SP-GiST, BRIN | ellipsoid (geography), planar (geometry) | yes (with holes) | yes | yes (`PointZ`) | both | | MySQL 8 | `POINT NOT NULL SRID srid` | `SPATIAL INDEX` (R-Tree) | `ST_Distance_Sphere` (sphere) | yes (no holes natively — emulated via diff) | partial — flatten to UNION | no native — alt stored side-by-side | sphere subset | | SQLite + SpatiaLite | `BLOB` geom, R-Tree virtual table | R-Tree via virtual table | `Distance(g, g, 1)` (ellipsoid mode) | yes (with holes) | yes | yes (`POINT Z`) | both | | SQLite (fallback) | JSON `{lng, lat}` | none (bbox scan) | haversine in JS | even-odd ray-cast in JS | yes (per-polygon ray-cast) | alt round-trip, not measured | sphere only | | DuckDB + spatial | `GEOMETRY` | R-Tree | `ST_Distance_Sphere` | yes (with holes) | yes | yes (`ST_Point3D`) | both | | MongoDB | GeoJSON in BSON | `2dsphere` | sphere ($near / $geoNear) | yes (with holes) | yes (GeoJSON MultiPolygon) | alt round-trip, not measured | sphere only | | MSSQL | `GEOGRAPHY` (4326), `GEOMETRY` (other SRIDs via app) | `CREATE SPATIAL INDEX` (quadtree grid) | ellipsoid (`STDistance`) | yes (with holes) | yes (MultiPolygon WKT) | yes (`POINT(x y z)`) | both | Notes: - **MySQL 8** axis order is lat-first for SRID 4326 geographic. Forge swaps coords at compile time via the `axis` param on `toGeoWKT` (see [`src/adapters/shared/wkt.ts`](../src/adapters/shared/wkt.ts)), so app code always passes `{lng, lat}`. - **MySQL 5.7** works but has no SRID metadata — `forge doctor` flags this. Treat 5.7 as "planar without a coordinate system". - **MongoDB** is 4326-only at the index level. Non-4326 fields force fallback (`fallback: true`). - **MSSQL** geography rejects geometries that span more than a hemisphere. Forge does not partition large polygons — feed it continent-scale or smaller. --- ## Fallback mode in detail Set `f.geoPoint({ fallback: true })` when the dialect's spatial extension isn't available. Three common reasons: - Managed PG without PostGIS (some lower tiers of Neon, Supabase, RDS). - Stripped SQLite — Expo SQLite, op-sqlite default builds, the stock `@sqlite.org/sqlite-wasm` (no R-Tree). The [custom wasm build](../README.md#custom-wasm-build-vec0--r-tree) flips this on for browser. - DuckDB without `INSTALL spatial` — uncommon since DuckDB bundles it from 0.9, but possible in restricted environments. What changes: - **Storage** — column type drops to JSON / TEXT containing `{"lng": …, "lat": …}` (or `{"lng", "lat", "alt"}` for 3D). - **Insert** — same `{ lng, lat }` shape, JSON-encoded. - **`near` filter** — compiles to two stages: 1. A bbox prefilter on the JSON-extracted lng/lat using the dialect's JSON operators (`->>'lng'`, `JSON_EXTRACT(...)`). The bbox is sized to enclose the search circle on the sphere plus a 5% margin for the cos(lat) compression at the boundary. 2. App-side haversine refinement that drops false positives from the bbox. - **`nearTo` orderBy** — works the same, but with no spatial index the database can't push down the sort. Forge fetches the bbox set, refines, sorts in app. - **`withinPolygon`** — bbox prefilter from the polygon's envelope, then even-odd ray-cast in `pointInMultiPolygon` (see [`src/geo/fallback.ts`](../src/geo/fallback.ts)). Holes work via parity — a point inside an outer ring and inside a hole flips twice and falls out, which is correct. Performance ceiling — roughly **50k rows** for interactive (< 100 ms) queries with a single-column index on `(active = true)`-style filter, and up to 100k for tolerant background jobs. Past that: - A `near` filter scans the whole bbox-matched set every query — no index push-down can shrink it. - Compound filters (`near` AND another predicate) help, because the other predicate's index runs first and the geo bbox runs over the smaller candidate set. Migrate to a real extension before the table crosses 100k rows. The schema move is one-way safe: turn `fallback: false`, run `forge push --enable-extensions`, and forge re-emits the column as the dialect's native geo type. Data round-trips through `ST_GeomFromText` / `GeomFromGeoJSON`. --- ## PostGIS deep-dive Forge emits `geography(Point, 4326)` by default and switches to `geometry(Point, srid)` for non-4326 SRIDs. That covers 90% of app code. The remaining 10% — the cases where you reach for raw SQL — benefits from a clear mental model of the two PostGIS types. ### `geography` vs `geometry` | | `geography` | `geometry` | |---|---|---| | Coordinate system | WGS84 only (SRID 4326) | Any SRID | | Units | always metres for distance functions | the SRID's units (metres, degrees, feet) | | Distance math | ellipsoidal (Vincenty) for points < some km, falls back to sphere | planar | | Index | GiST | GiST, SP-GiST, BRIN | | When to pick | sphere distance "just works", small areas, polygons that don't cross the antimeridian badly | non-4326 SRIDs, planar maths, ST functions PostGIS only ships for `geometry` | Practical rules: - App code with GPS-shaped data and "what's within N metres" queries: `geography(Point, 4326)`. Forge's default. - App code that joins with cadastral data in a national grid: `geometry(Point, 27700)` etc. Set `f.geoPoint({ srid: 27700 })`. - One model can mix both — give a Place its lat/lng `geoPoint` AND a polygon column you write through `$executeRaw` as `geometry(MultiPolygon, 27700)`. ### KNN with `<->` PostGIS's killer index feature is the `<->` operator: index-supported KNN with a B-tree-style ORDER BY. Forge's `nearTo` emits `ST_Distance(...)`, which **doesn't use the GiST index for sorting** because the index can't sort on an arbitrary function output. Reach for `<->` via `$queryRaw` when you need top-K by distance on a big table: ```ts type Row = { id: string; name: string; meters: number }; const point = `SRID=4326;POINT(${lng} ${lat})`; const rows = await db.$queryRaw` SELECT id, name, ST_Distance(location, ${point}::geography) AS meters FROM places WHERE org_id = ${orgId} ORDER BY location <-> ${point}::geography LIMIT 50 `; ``` Two notes: - The `<->` operator for `geography` uses the **bounding box centroid**, not the true ellipsoidal distance. Top-K is approximate at the edges — fine for "find me nearby" UI, wrong for billing. Refine with `ST_Distance` after the LIMIT if exact ordering matters. - The `<<->>` operator gives "centroid distance" between geometries, not points. Useful for "nearest polygon" queries. ### GiST vs SP-GiST `USING GIST` (forge default) is right for most data — points, polygons, mixed types. Switch to SP-GiST manually when: - All your geometries are points and there are > 10M of them. SP-GiST builds smaller indexes. - Read patterns are heavily skewed (one hot region) — SP-GiST's k-d-tree variant degrades more gracefully than GiST's R-Tree on hotspots. Set this via a raw migration; forge's IR doesn't expose the index algorithm yet: ```ts await db.$executeRaw` DROP INDEX IF EXISTS idx_places_geo; CREATE INDEX idx_places_geo ON places USING SPGIST (location); `; ``` `forge diff` will warn about the manual divergence — add the index to your schema's `indexes` array with `method: 'spatial'` to silence it (forge ignores the algorithm; it just needs to see a spatial index of some kind on the column). ### `ST_DWithin` vs `ST_Distance` Forge's `near` filter compiles to `ST_DWithin`, not `ST_Distance(...) < N`. The difference is index usage: - `ST_DWithin(a, b, N)` is pushed into the GiST index — runs in log(rows) on the bounded set. - `ST_Distance(a, b) < N` calls the function for every row — full table scan. If you ever write this by hand, always reach for `ST_DWithin`. The forge IR does it for you. ### Centroid distance trick When you need "what's nearest to this polygon", computing `ST_Distance` polygon-to-polygon is expensive (PostGIS walks the boundaries). Approximate with centroids first: ```ts const candidates = await db.$queryRaw` WITH target AS (SELECT ST_Centroid($1::geometry) AS c) SELECT id, ST_Distance(ST_Centroid(boundary), target.c) AS d FROM districts, target ORDER BY ST_Centroid(boundary) <<->> target.c LIMIT 100 `; ``` Then refine the top-K with the real polygon distance. The `<<->>` operator does the centroid sort using the index. --- ## Mongo 2dsphere and the $geoNear rewrite MongoDB's spatial story has one footgun: `$geoNear` must be the **first stage** of an aggregation, can only appear **once** per pipeline, and its `query` clause **cannot contain another `$near`**. That's fine when filter and orderBy reference the same field. It breaks when you do this: ```ts // "Customers within 5 km of warehouse W, ordered by distance to depot D." const rows = await db.customer.findMany({ where: { warehouse: { near: { lng: 3.30, lat: 6.50, withinMeters: 5000 } } }, orderBy: { depot: { nearTo: { lng: 3.45, lat: 6.44 } } }, }); ``` Before 2.5, forge dropped the `near` filter silently and you got "all customers ordered by depot distance" — wrong. 2.5 rewrites the cross-field `$near` to `$centerSphere`: ```js { $geoNear: { near: { type: 'Point', coordinates: [3.45, 6.44] }, key: 'depot', distanceField: '_distanceMeters', spherical: true, query: { warehouse: { $geoWithin: { $centerSphere: [[3.30, 6.50], 5000 / 6_371_008.8], }, }, }, }, } ``` `$centerSphere` measures the same great-circle distance as `$near` (radians on a sphere of radius 6,371,008.8 m, WGS84 mean), and IS allowed inside a `$geoNear.query`. The rewrite walks the filter tree recursively — handles `$and` / `$or` / `$nor` mixes. ### What the rewrite costs `$centerSphere` is a **bounding query**, not an indexed `$near`. The 2dsphere index still helps (it can prune by spherical caps), but you lose the ordered streaming `$near` does on the same key. In practice: - Same-field `near` + `nearTo`: forge collapses to a single `$geoNear` with `maxDistance` set. Fast. - Cross-field: forge rewrites the cross-field leaf to `$centerSphere`. Slightly slower than two separate index-backed queries but correct in one round-trip. If profiling shows the rewrite is the bottleneck, split into two queries — one to fetch the candidate ids near the warehouse, one to sort the survivors near the depot — and join in app. ### Pipeline ordering rules When forge composes a Mongo pipeline: 1. `$geoNear` first (if any `near` / `nearTo` is present). 2. `$match` stages for non-geo filters. 3. `$lookup` for relations selected via `include`. 4. `$sort` for non-distance orderBys. 5. `$skip`, `$limit`. 6. `$project` for `select`. The non-negotiable is step 1. If you reach for `$queryRaw`-equivalent (Mongo doesn't have it, but `db.$session.collection(...).aggregate(...)` gives you the same), keep `$geoNear` first. --- ## Distance modes — sphere, ellipsoid, planar Three models, three error budgets. ### Spherical (haversine, great-circle) Treats Earth as a sphere of radius R = 6,371,008.8 m. - Error vs the real ellipsoid: **up to 0.5%** worst case (equator-to-pole), typically **0.1–0.2%**. - 500 m route: ±2.5 m worst case. - 50 km route: ±250 m. - 5,000 km transatlantic: ±25 km. Forge's fallback mode, Mongo `$near`, MySQL `ST_Distance_Sphere`, DuckDB `ST_Distance_Sphere`, and SpatiaLite default mode all use this. ### Ellipsoidal (Vincenty / Karney) Treats Earth as a WGS84 ellipsoid (a = 6,378,137 m, f = 1 / 298.257_223_563). - Error vs reality: **sub-millimetre** anywhere on Earth. - The cost: each distance call solves an iterative formula — 3–5× slower than haversine in CPU. Negligible in app-DB round-trip terms but matters for million-row scans. PG `geography` distances, MSSQL `STDistance` on `GEOGRAPHY`, and SpatiaLite's `Distance(g, g, 1)` (the `1` flag) use this. ### Planar (Cartesian) Treats coordinates as Euclidean. Fine inside a single UTM zone or for very short distances on a sufficiently local projection. - 25 km in 3857 at the equator: ~0.1% error vs WGS84. - 25 km in 3857 at 60° latitude: 100% error (Mercator stretches). - 25 km in a matching UTM zone: < 0.005% error. ### Picking the mode | Use case | Recommended | Why | |---|---|---| | "Within 5 km of me" | sphere (haversine) | Fast, 25 cm error, good enough for nearby search. | | Driver-route distance estimate | ellipsoid | Compounds over the route. 0.2% on 50 km = 100 m off — borderline. | | Aviation / shipping leg lengths | ellipsoid | Mandatory. Sub-metre matters. | | Tile-pixel proximity | planar in 3857 | Matches the rendering coords. | | "Inside the city limits" | sphere or planar | The polygon boundary is fuzzier than either error. | | Billing-grade route distance | ellipsoid + real routing engine | Forge gets you to the polyline; OSRM/Valhalla gives you the road. | --- ## 3D coordinates `f.geoPoint({ dims: 3 })` opts into XYZ storage. App-side shape becomes `{ lng, lat, alt }`. The [2.5 CHANGELOG table](../CHANGELOG.md#3d--z-coordinates) shows the per-dialect column type. ### What `near` / `nearTo` actually compute **Ground distance only.** Forge treats Z as a passive attribute — it round-trips on read/write but doesn't enter the distance formula. A drone at 100 m altitude over the same spot as a car on the ground sees `_distanceMeters = 0` from the car. This is intentional. Per-dialect 3D distance behaviour is wildly inconsistent (PostGIS `ST_3DDistance` is true 3D Euclidean, MongoDB has no 3D distance, MSSQL geography ignores Z, MySQL has no native Z), so forge picks the most useful common denominator — 2D-on-sphere — and lets you opt into true 3D when you need it. ### When you actually need 3D Euclidean True 3D Euclidean (sqrt((dx)² + (dy)² + (dz)²) over metres): ```ts const Place = model('places', { id: f.id(), location: f.geoPoint(), vec3: f.vector(3), // x_m, y_m, z_m in a local frame }); // Insert both — vec3 in a local ENU (East-North-Up) frame. await db.place.create({ data: { id: 'a', location: { lng: 3.45, lat: 6.44 }, vec3: [0, 0, 100], // 100 m up from the origin }, }); // 3D nearest neighbour — true Euclidean via the vector index. const near3d = await db.place.findMany({ where: { vec3: { nearTo: [0, 0, 50], withinDistance: 75 } }, }); ``` The [Vector chapter](../README.md#vector-similarity-search) covers the rest. Vector indexes (HNSW on PG, M-Tree on DuckDB, sqlite-vec on SQLite) give you index-supported KNN that geo can't. ### Aviation / drone use cases A reasonable layout for drone telemetry: - `f.geoPoint({ dims: 3 })` for the position record. Lets map UIs consume the lat/lng/alt directly. - `f.vector(3)` in a UTM-local ENU frame for distance queries. - A `f.geoPoint()` (2D) for the operator's geofence centre, with `withinPolygon` for the no-fly zone. Update both fields on each tick. The 2D field drives the map; the vector drives proximity alerts. --- ## MultiPolygon + GeometryCollection patterns 2.5 accepts four shapes for `withinPolygon`. They normalise to a uniform `multiPolygon: Polygon[]` internally (each Polygon = `Ring[]`, each Ring = `Array<{lng, lat}>`). See [`src/adapters/shared/wkt.ts`](../src/adapters/shared/wkt.ts) for the WKT emit and [`src/geo/fallback.ts`](../src/geo/fallback.ts) for `pointInMultiPolygon`. ### Admin boundaries with holes Vatican-in-Italy, San Marino-in-Italy, Lesotho-in-South Africa — real admin boundaries have holes. The legacy single-ring shape can't represent them. Use the Polygon form: ```ts const italy = { type: 'Polygon' as const, rings: [ outerItalyRing, // closed ring around Italy vaticanCityHole, // closed ring of the Vatican sanMarinoHole, // closed ring of San Marino ], }; const inside = await db.address.findMany({ where: { location: { withinPolygon: italy } }, }); ``` The fallback ray-cast walks each ring with parity flipping — a point inside the outer AND inside a hole flips twice and is correctly classified as outside. The same query on PG/MSSQL emits a WKT polygon with multiple rings and `ST_Contains` / `STContains` handles holes natively. ### Service-area unions Multiple disjoint shapes — a delivery service with three regions that aren't contiguous: ```ts const serviceArea = { type: 'MultiPolygon' as const, polygons: [ [eastZoneOuter, eastZoneHole], [westZoneOuter], [northSatelliteOuter], ], }; const candidates = await db.customer.findMany({ where: { location: { withinPolygon: serviceArea } }, }); ``` `MULTIPOLYGON(((…)),((…)),((…)))` on the SQL dialects; GeoJSON `MultiPolygon` on Mongo. The fallback short-circuits as soon as a point matches any constituent polygon. ### Postal code polygons UK postcode districts (`SW1A`, `EC1V`) are typically loaded as MultiPolygons — a single postcode can include offshore islands or detached fragments. Workflow: 1. Ingest the ONS postcode polygon dataset (GeoJSON) into a reference table: ```ts const Postcode = model('postcodes', { code: f.string().primaryKey(), polygon: f.json(), // raw GeoJSON MultiPolygon }); ``` 2. Resolve incoming address points by looping the codes the address's bbox overlaps and refining: ```ts const candidates = await db.postcode.findMany({ where: { /* spatial bbox prefilter via JSON expression */ }, }); for (const c of candidates) { const hit = await db.address.findMany({ where: { location: { withinPolygon: c.polygon } }, take: 1, }); if (hit.length) return c.code; } ``` In practice you want to store the polygon in the dialect's native geo type (not JSON) so PostGIS's `ST_Contains` can use the GiST index. Forge models don't yet declare polygon columns natively — use `$executeRaw` for the DDL and `$queryRaw` for the lookup; the forge model handles only the points. ### Point-in-polygon perf cliffs PG / MSSQL handle 10k-vertex polygons fine on a single query. When you query the same polygon 10k times in a batch (one address batch against one service-area MultiPolygon), the per-call geometry parse adds up. PostGIS-only trick — `ST_PreparedGeometry`: ```ts await db.$queryRaw` WITH area AS ( SELECT ST_GeomFromGeoJSON(${JSON.stringify(serviceArea)}) AS g ) SELECT addresses.id FROM addresses, area WHERE ST_Contains(area.g, addresses.location::geometry); `; ``` The CTE materialises the polygon once. PostGIS internally builds a `PreparedGeometry` for the join and reuses the indexed edge tree across all rows. 10–50× faster than a literal subquery per row. --- ## GeoJSON pipeline GeoJSON is the lingua franca for geo data — Mapbox, Leaflet, ogr2ogr, Natural Earth, OpenStreetMap export tools all speak it. Forge's geo IR is **not** GeoJSON — it's `{lng, lat}` for points and the normalised polygon shape above. The boundary is at the adapter layer. ### FeatureCollection to forge create ```ts import { readFile } from 'node:fs/promises'; const fc: FeatureCollection = JSON.parse( await readFile('./places.geojson', 'utf8'), ); await db.$transaction(async (tx) => { for (const feature of fc.features) { if (feature.geometry.type !== 'Point') continue; const [lng, lat, alt] = feature.geometry.coordinates; await tx.place.create({ data: { id: feature.properties!.id, name: feature.properties!.name, location: alt != null ? { lng, lat, alt } : { lng, lat }, }, }); } }); ``` GeoJSON's coordinate order is `[lng, lat, alt?]` — same as forge's `{lng, lat, alt}`. No reordering needed. ### Simplification before insert Polygons exported from OSM commonly have one vertex per metre. Inserting unmodified eats index space and slows queries. Run Douglas–Peucker before insert: ```ts import simplify from 'simplify-js'; function simplifyRing(ring: Array<{lng: number; lat: number}>) { const pts = ring.map((p) => ({ x: p.lng, y: p.lat })); // tolerance in degrees — 0.0001 ≈ 11 m at the equator. return simplify(pts, 0.0001, true).map((p) => ({ lng: p.x, lat: p.y })); } ``` Pick the tolerance to match your render zoom. For city-level polygons displayed at zoom 12 (web tile pixel ≈ 38 m), 0.0001° is overkill; 0.001° (~110 m) renders identically. ### `ST_GeomFromGeoJSON` round-trip When the polygon's source of truth is GeoJSON (a CMS, a Mapbox draw control, a feature from an external API), don't unpack to `{lng, lat}` arrays — go straight through PostGIS: ```ts await db.$executeRaw` UPDATE service_areas SET boundary = ST_GeomFromGeoJSON(${JSON.stringify(feature.geometry)}) WHERE id = ${areaId}; `; ``` PostGIS validates the GeoJSON, fixes ring winding (RFC 7946 vs PG conventions differ), and stores natively. Read back as GeoJSON with `ST_AsGeoJSON`: ```ts const rows = await db.$queryRaw<{id: string; geojson: string}[]>` SELECT id, ST_AsGeoJSON(boundary) AS geojson FROM service_areas WHERE id = ${areaId}; `; const geometry = JSON.parse(rows[0].geojson); ``` --- ## Spatial joins "Find every store within 5 km of every customer" — the classic spatial join. Naïvely it's O(stores × customers); spatially-indexed it's much closer to linear in the result size. ### PostGIS via `LATERAL` + `ST_DWithin` ```ts type Pair = { customer_id: string; store_id: string; meters: number; }; const pairs = await db.$queryRaw` SELECT c.id AS customer_id, s.id AS store_id, ST_Distance(c.location, s.location) AS meters FROM customers c CROSS JOIN LATERAL ( SELECT id, location FROM stores WHERE org_id = c.org_id AND ST_DWithin(location, c.location, 5000) ) s ORDER BY c.id, meters; `; ``` The lateral subquery runs the GiST index on `stores.location` once per customer — so for 10k customers × 1M stores it's 10k log(1M) index probes, not 10^10 distance calls. ### Mongo $geoNear inside $lookup Mongo's analog. `$lookup` can host a sub-pipeline; that sub-pipeline can have its own `$geoNear` as the first stage. Reach for the underlying collection — forge doesn't expose this in the high-level surface yet: ```ts const pairs = await db.$session.collection('customers').aggregate([ { $match: { org_id: orgId } }, { $lookup: { from: 'stores', let: { c_loc: '$location' }, pipeline: [ { $geoNear: { near: '$$c_loc', distanceField: 'meters', maxDistance: 5000, spherical: true, }, }, { $match: { $expr: { $eq: ['$org_id', orgId] } } }, ], as: 'stores_nearby', }, }, ]).toArray(); ``` Caveat — `$geoNear` doesn't actually accept a variable reference in pre-7.0 Mongo. Workable on 7.0+; on older versions, denormalise the join into the app layer (fetch customers, then `$geoNear` per batch). ### What forge gives you, what falls back to raw | Pattern | Forge surface? | Fallback | |---|---|---| | Single-side `near` + scalar predicate | Yes | n/a | | Single-side `near` + `include` relation | Yes (relation join after the geo filter) | n/a | | Two-side spatial join | No | Raw SQL on PG / MySQL / SQLite / DuckDB / MSSQL; raw aggregation on Mongo | | K-nearest per row | No | PG `<->` operator inside `LATERAL`; Mongo `$geoNear` inside `$lookup` | The trade-off is deliberate. Spatial joins aren't a Prisma-shape operation; forcing them into a typed surface gives you something brittle. Forge ships you the typed setup (declarations, `near` single-side, GeoJSON round-trip) and gets out of the way for the joins. --- ## Heatmaps and H3 Visualising a million points as raw markers is unworkable. Two patterns: - **Polygon overlay** — bucket points into admin polygons, colour by count. Requires a polygon dataset. - **H3 grid** — Uber's hexagonal hierarchical index. Tile-free, uniform cell size globally, multiple resolutions. H3 plays well with forge: store the index alongside the point and query by index when you want aggregation, by point when you want precision. ### Storing H3 indexes ```ts import { latLngToCell } from 'h3-js'; const Event = model('events', { id: f.id(), org_id: f.string(), location: f.geoPoint(), h3_r7: f.string(), // resolution 7 — ~5 km² cells h3_r9: f.string(), // resolution 9 — ~0.1 km² cells }, { indexes: [ { keys: { location: 1 }, method: 'spatial', name: 'idx_events_geo' }, { keys: { org_id: 1, h3_r7: 1 }, name: 'idx_events_h3_r7' }, { keys: { org_id: 1, h3_r9: 1 }, name: 'idx_events_h3_r9' }, ], }); // At write time: async function insertEvent(input: { id: string; orgId: string; lng: number; lat: number }) { await db.event.create({ data: { id: input.id, org_id: input.orgId, location: { lng: input.lng, lat: input.lat }, h3_r7: latLngToCell(input.lat, input.lng, 7), h3_r9: latLngToCell(input.lat, input.lng, 9), }, }); } ``` ### Aggregation ```ts // Heatmap at city zoom. const cells = await db.event.groupBy({ by: ['h3_r7'], where: { org_id: orgId, ts: { gte: since } }, _count: { _all: true }, }); // Resolve h3_r7 cell to centroid lat/lng in app via cellToLatLng(). ``` H3 wins over spatial-index aggregation because grouping by a string column is what every SQL engine is fastest at. PostGIS can compute hex grids via `ST_HexagonGrid`, but the join is more expensive than a string GROUP BY. ### Retrieval For "show me every event within 5 km of here": - If the query is rare and exact: use the `location` field with `near`. - If the query is the hot path: precompute a `kRing(centerCell, k)` in app, query by `h3_r9: { in: cells }`. Skips the spatial index entirely and hits the B-tree on `h3_r9`. Up to 100× faster on > 1M-row tables. --- ## Realtime tracking Vehicle / driver / asset position updates — high write rate, small result reads, fast staleness. Three concrete recommendations. ### Partial index on `active = true` A fleet of 1M vehicles where only 5k are active at any moment shouldn't have a spatial index covering inactive rows. Partial indexes: ```ts const Vehicle = model('vehicles', { id: f.id(), org_id: f.string(), location: f.geoPoint(), active: f.boolean(), ts: f.dateTime(), }, { indexes: [ { keys: { location: 1 }, method: 'spatial', name: 'idx_vehicles_active_geo', where: { active: { eq: true } }, }, ], }); ``` The 2.2 release added partial-filter indexes ([CHANGELOG 2.2.0](../CHANGELOG.md#220--indexdef-coverage-compile-api-softdelete-on-compile-semanticop)). On PG the SQL is `CREATE INDEX … WHERE active = true`; on Mongo it's `partialFilterExpression`; on MySQL it emulates via a generated column. Spatial index size scales with active rows, not total rows. ### Throughput at 1k updates/sec Spatial indexes are write-amplifying — every update touches the geo index plus the row. At 1k updates/sec on PG: - `UPDATE vehicles SET location = …, ts = …` is fine on a single primary up to ~5k/sec — the GiST rebalance is cheap on a small active set. - The pattern that breaks first is the `ts` index, not the `location` index, if you're sorting by recency. - For > 5k/sec, batch updates per device with a 250 ms window in app — most location updates compress to "last known position per vehicle" anyway. ### Mongo TTL collection for trailing positions The "where is vehicle V in the last 5 minutes" pattern wants recent positions, not the full history. Mongo's TTL index keeps the collection small without app-side pruning: ```ts const Position = model('positions', { id: f.id(), veh_id: f.string(), location: f.geoPoint(), ts: f.dateTime(), }, { indexes: [ { keys: { veh_id: 1, ts: -1 }, name: 'idx_positions_veh_ts' }, { keys: { location: 1 }, method: 'spatial', name: 'idx_positions_geo' }, { keys: { ts: 1 }, name: 'ttl_positions_ts', expireAfterSeconds: 300, // 5 minutes }, ], }); ``` The TTL index — a [Mongo-only feature](../README.md#indexes-and-unique-constraints) on the IR — drops old rows in the background. Spatial index stays small; query latency stays flat under sustained 1k/sec. PG analog: a daily partition + drop. Forge's IR doesn't generate table partitions, but a cron-managed `DROP TABLE positions_2026_06_22` combined with a view over `UNION ALL` of the recent partitions is the standard pattern. --- ## Testing geo Geo tests fail in the worst way — they pass on your machine, they fail in CI when the floating-point library is off by 2 ULPs, they pass again when you re-run. Three rules. ### Deterministic lat/lng fixtures Don't randomise. Pick reference coordinates and stick to them: ```ts const REF = { lekki: { lng: 3.4505, lat: 6.4416 }, // Lagos ikoyi: { lng: 3.4350, lat: 6.4500 }, // ~1.6 km from lekki vi: { lng: 3.4225, lat: 6.4275 }, // ~3.5 km from lekki ikeja: { lng: 3.3500, lat: 6.6000 }, // ~22 km from lekki } as const; ``` Reuse the same set across tests. Distances are stable to 0.1 m across haversine / Vincenty implementations, so assertions that target ±10 m are safe. ### Distance assertions with epsilon Never `expect(distance).toBe(1600)`. Always: ```ts expect(distance).toBeCloseTo(1600, -2); // within ±50 m (precision = -2 → tolerance 5×10^2) ``` Or define an explicit helper: ```ts function expectMeters(actual: number, expected: number, epsilonM = 10) { expect(Math.abs(actual - expected)).toBeLessThanOrEqual(epsilonM); } ``` The epsilon should reflect the model: 1 m for ellipsoidal, 10 m for spherical at city distances, 100 m for planar over 50 km. ### What to mock, what to hit real Mock — forge's `db.*.findMany({...})` surface in pure-IR tests. The compile layer is unit-tested already in the forge repo. Hit a real DB — anything that exercises: - The dialect's spatial index choosing the right plan. - WKT round-trips through `ST_GeomFromText`. - The fallback ray-cast against a non-trivial polygon (a C-shape, a polygon with a hole). - Mongo's `$geoNear` rewrite (the [2.5 cross-field fix](../CHANGELOG.md#mongo-near--nearto-cross-field)). The forge repo's `regression-geo-duckdb.ts` and the geo cases in `integration-pg.ts` / `integration-mongo.ts` show the pattern: a Docker-Compose-managed PG + PostGIS, a Mongo replica set, a DuckDB in-memory adapter. Adopt the same shape — `docker compose up` in CI, real engine, fixtures from the deterministic set above. For browser testing, the [custom wasm build](../README.md#custom-wasm-build-vec0--r-tree) with R-Tree compiled in is the only way to test the native code path in jest-environment-jsdom; without it you're testing the fallback mode regardless of intent. Set `FORGE_WASM_PRO_URL` to point at the local artifact and the jest worker picks it up.