# HTML Import / Export — Feature Context ## 1. Feature Overview CascadeEditor's domain model is a `List` (paragraphs, headings, lists, code, quote, divider, todo, custom) with rich-text spans. Many integration targets exchange content as HTML strings, but each target uses a different *dialect* — Quill-flavored, GitHub-style code, Notion-style todos. There was no existing way to round-trip a document through such an HTML payload. This feature ships a profile-driven HTML codec under a new `htmlserialization/` package, parallel to the existing JSON `serialization/` layer. A single configuration object — `HtmlProfile` — drives both directions through a stateless `HtmlSchema` entry point. The library ships an opinionated `HtmlProfile.Default` (HTML5-ish canonical mappings) plus public extension points so a consumer can express its dialect with overrides instead of forking the parser. A reference dialect profile (`sample/.../CustomHtmlProfile.kt`) demonstrates the pattern end-to-end without leaking dialect code into `:editor`. The codec is common-only Kotlin Multiplatform: no `org.jsoup`, no browser DOM, no platform parser. Decode and encode never throw on user input or consumer-supplied callback failure; everything surfaces as a structured warning. Out of scope (intentionally): full HTML5 spec compliance, browser-equivalent error recovery, a generic cross-format `DocumentImporter`/`DocumentExporter`, and any built-in vendor-specific profile inside `:editor`. ## 2. Architecture & Design Decisions ### 2.1 Module placement ``` editor/src/commonMain/kotlin/io/github/linreal/cascade/editor/ ├── serialization/ existing JSON layer └── htmlserialization/ new HTML layer sample/src/commonMain/kotlin/io/github/linreal/cascade/profiles/ └── CustomHtmlProfile.kt reference dialect profile ``` ### 2.2 New types Public surface (`htmlserialization/`): - All public declarations are marked `@ExperimentalCascadeHtmlApi` with warning-level opt-in while the codec stabilizes. - `HtmlSchema` — stateless object, four entry points (`HtmlSchema.kt:16`). - `HtmlProfile` + `HtmlProfile.Default` — immutable configuration bundle (`HtmlProfile.kt:38`). - `HtmlProfileSupportSet` — predicate-based round-trip claim (`HtmlProfileSupportSet.kt:29`). - Codec contracts: `HtmlNodeView`, `TagDecoder`, `TagDecodeContext`, `TagDecodeResult`, `InlineFragment`, `BlockEncoder`, `SpanEncoder`, `BlockGroupEncoder`, `HtmlEncodeContext`, `HtmlEmit`, `HtmlTagPair` (`HtmlCodecContracts.kt`). - Policies: `BlockSeparator`, `InlineRoot`, `EntityDecode`, `UnknownTagPolicy` (`HtmlPolicies.kt`). - Reports: `HtmlDecodeResult`, `HtmlEncodeResult`, `HtmlDecodeWarning`, `HtmlEncodeWarning` (`HtmlResults.kt`, `HtmlWarnings.kt`). - Helpers: `Html.escapeText` / `Html.escapeAttr` (`HtmlEscaping.kt`); `openTagWithCascadeIndentation` and `CASCADE_INDENT_CLASS_PREFIX` (`DefaultBlockEncoders.kt`, `DefaultListOutlineEncoder.kt`). - Editor extensions: `EditorStateHolder.toHtml(...)` and `loadFromHtml(...)` (`HtmlSerializationExt.kt`). Internal surface (same package): - `HtmlParser` (entry), `HtmlTokenizer`, `HtmlTreeBuilder`, `HtmlNode`, `HtmlToken`, `HtmlPolicyApplier` — parser pipeline. - `HtmlDecodeEngine`, `TagDecodeContextImpl`, `HtmlNodeViewMapper`, `DefaultTagDecoders`, `PreservedHtmlBlockType` — decode side. - `HtmlEncodeEngine`, `HtmlEncodeContextImpl`, `DefaultBlockEncoders`, `DefaultSpanEncoders`, `DefaultListOutlineEncoder`, `DefaultEncoderFallbacks` — encode side. ### 2.3 Patterns and rationale - **Single configuration object (`HtmlProfile`).** Encode and decode mappings are nearly always symmetric (decode ``→`Bold`, encode `Bold`→``). Splitting them would force consumers to keep two parallel structures aligned by hand. `HtmlProfile.kt:38`. - **Immutability with builder methods.** `HtmlProfile` uses `internal constructor` + private `copyWith(...)` instead of `data class` so the public surface stays narrow and composition (`with*` / `without*`) is side-effect-free. Overrides *replace* — there is no chain dispatch; consumers compose by delegating to `tagDecoderFor` (`HtmlProfile.kt:62-77`). - **Hand-written parser, common-only.** Roughly 700 LoC across `HtmlTokenizer.kt` + `HtmlTreeBuilder.kt`. The dialect grammar is small (no scripts/styles/DOCTYPE), so KMP-pure parsing is cheaper than dealing with platform fragmentation. - **Source ranges as UTF-16 half-open offsets.** Every `HtmlNode` / `HtmlNodeView` exposes `sourceStart` and `sourceEndExclusive` so `rawSource.substring(start, end)` returns the verbatim original slice. Used for accurate warning offsets and `UnknownTagPolicy.Preserve` lossless capture. - **Three encoder kinds (`BlockGroupEncoder`, `BlockEncoder`, `SpanEncoder`).** A flat block stream cannot express HTML's structural relationships — consecutive list items must wrap in one `