# Structured-source import (v4.2)

latticesql 4.2 can turn a **structured file** — a JSON object or an Excel
`.xlsx` workbook — into a Lattice schema (entities, dimensions, junctions) and
materialize it into a workspace. Everything here is **additive and opt-in**:
absent a file drop, behavior is byte-identical to 4.1.

The feature is reachable **only by dropping a file into the assistant rail** in
`lattice gui`. There is no CLI verb and no separate endpoint to call by hand —
the upload pipeline builds a proposal, and a confirmed proposal is applied via
`POST /api/import/apply`. The same inference and materialization functions are
also exported from `latticesql` for library use (see [Library API](#library-api)).

## What it does

When you drop a recognized JSON / `.xlsx` source into the chat:

1. **Infer a schema.** `inferSchema` reads the source and proposes **entities**
   (record collections that become tables), **dimensions** (small repeated value
   sets that become a shared taxonomy / dictionary), and **junctions** (the
   many-to-many links between them). Field types are inferred per column
   (`inferFieldType`), and source keys are normalized to table/column names
   (`normalizeName`).
2. **Read Excel natively.** `excelToRecords` turns each sheet into records by
   detecting the header row and the data region. A per-slice tab that is just a
   filtered view of a master sheet is recognized as a **read-only view** (no
   duplicated rows) rather than a second table — see `dedupeAndDetectViews`.
3. **Detect an as-of date for point-in-time snapshots.** `detectAsOf*` looks at
   the file's contents, then its name, then an Excel preamble, then a Claude
   fallback — or a per-row date **column** (`detectAsOfColumns`, `parseCellDate`).
   When a date is found, every materialized row is stamped `as_of` and the row
   identity folds it in, so **re-importing a newer period APPENDS a dated
   snapshot beside the prior one** instead of overwriting it. Dimensions (the
   shared taxonomy) are not dated.
4. **Recognize a re-import.** `matchSchemaToExisting` fingerprints the inferred
   schema and matches it against the tables already in the workspace, so a
   re-upload lands as a **new snapshot of the existing tables**, not a duplicate
   set. `renameEntities` applies any entity → table-name overrides.
5. **Materialize.** `materializeImport` creates the tables (idempotently),
   inserts the rows + links, persists the schema to the workspace config, and
   builds the detected read-only views.

## Silent import vs. the inline confirm card

The chat drop chooses one of three paths automatically:

- **Recognized dataset + a confident date → silent import.** The file matches
  tables already in the workspace and a date was confidently detected, so it is
  imported straight away as a dated snapshot and reported in the activity feed.
- **Recognized dataset but no / ambiguous date → confirm card.** Importing
  undated would overwrite the prior snapshot, so an **inline confirm card**
  proposes the date (and any per-row date column) before anything is written.
- **Brand-new structured data → confirm card.** Tables are never created
  silently from a chat drop. The card proposes the full schema, the date, and
  the mode for you to review and apply.

Either way, nothing is written until a confident match resolves silently or you
confirm the card; the confirmed proposal is applied via `POST /api/import/apply`,
which streams the materialization progress back as NDJSON.

## File-size cap

A source file is capped at **50 MB**, and the cap is enforced **on both paths**:
the streaming upload rejects an oversized file, and the apply route re-`statSync`s
the retained bytes before reading them — so an oversized or swapped-on-disk
source (including one reached via a `local_ref` that never went through the
upload) cannot be streamed whole into memory.

## Library API

The inference + materialization functions are exported from `latticesql` and run
GUI-independently:

```ts
import {
  inferSchema,
  inferFieldType,
  normalizeName,
  sourceRecords,
  excelToRecords,
  dedupeAndDetectViews,
  detectAsOf,
  detectAsOfCandidates,
  detectAsOfColumns,
  parseCellDate,
  matchSchemaToExisting,
  renameEntities,
  materializeImport,
} from 'latticesql';

// JSON object → proposed schema
const plan = inferSchema(data); // { entities, dimensions, junctions, skipped }

// Detect the as-of date and any per-row date column
const asOf = detectAsOf(fileName); // ISO YYYY-MM-DD | null
const asOfColumns = detectAsOfColumns(data, plan);

// Detect read-only views (per-slice tabs that mirror a master)
const { views } = dedupeAndDetectViews(data, plan);

// Materialize into a workspace
const result = await materializeImport({ db, configPath }, data, plan, views, {
  mode: 'both',
  asOf,
  asOfColumn: null,
});
// result: { mode, asOf, asOfColumn, tablesCreated, rowsByTable, links, views }
```

`materializeImport` takes a `mode` of `'schema'` (table structures + dimension
values + views), `'contents'` (entity rows + links into existing tables), or
`'both'` (the default). When `asOf` (a file-level ISO date) or `asOfColumn` (a
per-row date column) is set, rows are stamped and the row identity folds the date
in, so the same model imported at a new date is a distinct snapshot rather than an
overwrite. `onProgress` streams the per-phase pipeline steps for a live view.

See [CHANGELOG.md](../CHANGELOG.md) for the full 4.2 list and
[assistant.md](assistant.md) for the chat-drop experience.