# Migrating from alpha.24 to alpha.25 > **Zero breaking changes — runtime AND type-level.** alpha.25 is fully additive. Existing code compiles and runs without modification. > > **Heads-up for alpha.26 planning.** The next release (**alpha.26**) will be a **BREAKING API unification**: the four generation methods (`generateText` / `generateStructured` / `streamText` / `streamStructured`) will move from `{ instructions, prompt }` to a canonical `messages: LLMMessage[]` input. A one-cycle deprecation window is planned. See the [alpha.26 planning discussion](https://github.com/baabakk/llm-ports/discussions) for the full plan. ## Install ```bash pnpm add @llm-ports/core@alpha @llm-ports/adapter-openai@alpha ``` All 7 publishable packages bumped to `0.1.0-alpha.25`. ## The headline Three additive features under an "Observability surface + reliability hardening" theme: 1. **`refs?: Record`** — domain-agnostic trace-metadata field on every call, threaded verbatim to every observability event. Perfect for prompt versioning, cost attribution by tenant / project / experiment, session correlation, or any versioned-artifact identity you want stamped onto trace ([issue #53](https://github.com/baabakk/llm-ports/issues/53)). 2. **`runtimeFallback: "aggressive"`** — the opinionated classifier three consumers rebuilt by hand (BEPA Plan 29, HomeSignal, SalesCoach Plan 30). Walks the chain on rate limits, empty responses, context-window exhaustion, credit-exhaustion 400s, and raw 5xx status codes — not just `ProviderUnavailableError` ([issue #54](https://github.com/baabakk/llm-ports/issues/54)). 3. **Streamed cost surfacing** — `onCost` + `onTokenUsage` observability hooks now fire at natural stream completion for `streamText` and `streamStructured` (adapter-openai in this release; other adapters follow in patch releases) ([issue #55](https://github.com/baabakk/llm-ports/issues/55)). Zero code changes required for existing consumers. All three features are opt-in. ## What was added ### 1. `refs` field for trace-metadata on every call Add consumer-owned artifact identifiers to any call; they flow through to every observability event (`onCost`, `onTokenUsage`, `onFallback`, `onCacheHit`, `onValidationRetry`) verbatim. Never sent to the model. Never persisted by the library. ```ts import type { ArtifactRef } from "@llm-ports/core"; const result = await port.generateStructured({ taskType: "extract-team-dev", prompt: userRequest, schema: TeamDevSchema, refs: { prompt: { key: "team-dev.materialize", version: 7, hash: "abc123..." }, scaffold: { key: "puzzle-service", version: 3 }, tenant: { key: "acme-corp" }, experiment: { key: "tone-experiment", version: "variant-b", meta: { cohort: "control" } }, }, }); ``` The observability side reads them back cleanly: ```ts const registry = createRegistryFromEnv({ observability: { onCost: (event) => { audit.recordCost({ totalUsd: event.totalUsd, modelId: event.modelId, promptVersion: event.refs?.prompt?.version, scaffoldVersion: event.refs?.scaffold?.version, tenant: event.refs?.tenant?.key, }); }, }, }); ``` **Non-goals (guard against scope creep):** - Not validated. Empty object is legal; unknown keys are legal. - Not sent to the model. Trace metadata, not prompt content. - Not read by adapters. Pass-through only. - No vocabulary standardization. Consumer picks the keys. - No merging / inheritance across nested `runAgent` calls. ### 2. `runtimeFallback: "aggressive"` preset Three consumers rediscovered the same lesson: the default classifier walks only on `ProviderUnavailableError`, which lets credit-exhaustion 400s and empty-response 200s abort the chain in production. The `"aggressive"` preset bundles the classifier: ```ts import { createRegistryFromEnv } from "@llm-ports/core"; const registry = createRegistryFromEnv({ adapters: { openai: openaiAdapter, cerebras: cerebrasAdapter, groq: groqAdapter }, runtimeFallback: "aggressive", // NEW in alpha.25 }); ``` Walks on: | Signal | Rationale | | ------------------------------------- | -------------------------------------------------- | | `ProviderUnavailableError` | Existing default | | `RateLimitError` | Try next provider rather than wait out backoff | | `EmptyResponseError` | Adapter's own retries gave up; try elsewhere | | `ContextWindowExceededError` | Try a larger-window provider | | `BadRequestError` w/ credit patterns | Account can't serve any call right now | | Raw error with `status >= 500` | Defensive check for adapters that don't wrap 5xx | Does NOT walk on: - `AuthenticationError` (401/403 — credential needs fixing, not routing). - Generic `BadRequestError` (malformed request — would fail everywhere). - `ContentPolicyViolationError` (policy filter — separate concern). - `BudgetExceededError` / `SessionBudgetExceededError` (port-internal gating). For fine-grained control, the object form still wins: ```ts runtimeFallback: { shouldFallback: (e) => aggressiveShouldFallback(e) || (e instanceof MyCustomError), }, ``` The classifier and the credit-exhaustion pattern list are exported for reuse: ```ts import { aggressiveShouldFallback, AGGRESSIVE_CREDIT_EXHAUSTION_PATTERNS, } from "@llm-ports/core"; ``` ### 3. Streamed cost surfacing `onCost` and `onTokenUsage` fire once at natural stream completion for `streamText` and `streamStructured` — matching the non-streaming contract. Enabled automatically for `adapter-openai` via `stream_options: { include_usage: true }`. ```ts const registry = createRegistryFromEnv({ adapters: { openai: openaiAdapter }, observability: { onCost: (e) => { if (e.operation === "streamText" || e.operation === "streamStructured") { stats.streamed.add(e.totalUsd); } }, }, }); for await (const chunk of registry.getPort().streamText({ taskType: "chat", prompt: "hello", refs: { session: { key: "sess-abc123" } }, })) { ui.append(chunk); } // onCost + onTokenUsage fired once at completion with refs.session.key preserved. ``` **Semantics enforced:** - Emit ONCE per stream, at natural completion. - Mid-stream errors do NOT emit (no completion → no billable success). - Consumer-cancelled streams (via `AbortSignal`) do NOT emit — provider billing for partial completions is the provider's contract. - Adapters that don't yet implement the stream-completion path just skip the emission (no error, matches alpha.24 behavior). **Opt-out at the adapter for compat providers that reject `stream_options`:** ```ts const adapter = createOpenAIAdapter({ apiKey: process.env.WEIRD_COMPAT_KEY!, baseURL: "https://api.weird-compat.example/v1", streamUsage: false, // alpha.25+; defaults to true }); ``` ## Interaction between the three features `refs` composes cleanly with the other two. A streamed call with refs still fires `onCost` at completion with `refs` on the event; a streamed call under `"aggressive"` fallback still preserves `refs` across chain advancement: ```ts for await (const chunk of registry.getPort().streamText({ taskType: "chat", prompt: "hello", refs: { prompt: { key: "greeting-v3" } }, })) { ui.append(chunk); } // If primary rate-limits → aggressive walks → backup succeeds: // onFallback fires with refs.prompt.key = "greeting-v3" // onCost + onTokenUsage fire at stream completion with refs.prompt.key = "greeting-v3" ``` ## Package versions All 7 publishable packages bumped in lockstep: - `@llm-ports/core@0.1.0-alpha.25` - `@llm-ports/adapter-openai@0.1.0-alpha.25` - `@llm-ports/adapter-anthropic@0.1.0-alpha.25` - `@llm-ports/adapter-google@0.1.0-alpha.25` - `@llm-ports/adapter-ollama@0.1.0-alpha.25` - `@llm-ports/adapter-vercel@0.1.0-alpha.25` - `@llm-ports/capabilities@0.1.0-alpha.25` ## What's next: alpha.26 is BREAKING The alpha.26 release will unify the input shape across all five port methods around a canonical `messages: LLMMessage[]` field. The current `{ instructions, prompt }` compression on `generateText` / `generateStructured` / `streamText` / `streamStructured` will move to `@deprecated` in alpha.26 and be removed in alpha.27. A one-line migration shim ships in alpha.26: ```ts import { toMessages } from "@llm-ports/core"; port.generateText({ taskType: "triage", messages: toMessages(SYSTEM_PROMPT, userInput), // shim }); ``` Full details in the alpha.26 planning discussion. The alpha.25 → alpha.26 upgrade path will be mechanical for existing consumers via `toMessages()`; the removal window from alpha.26 → alpha.27 is planned at ~2 weeks. ## Full test coverage - 8 refs tests (7 canonical cases from the proposal + one for empty-refs semantics) - 23 aggressive-fallback tests (positive + negative per error class, body-pattern matrix, Registry integration) - 5 streamed-cost tests (callback firing, no-op path, mid-stream error path, refs preservation, streamStructured parity) - All existing alpha.24 tests continue to pass unchanged 864 total tests pass across the workspace (was 828; +36; zero regressions).