%% Evermind: A Self-Updating On-Device Cognitive Architecture
%% IEEE Transactions-style manuscript (IEEEtran).
%% Compile on Overleaf (recommended) or locally with:
%%   latexmk -pdf evermind-architecture.tex
%% Figures are supplied as vector SVG in ./figures and included via the `svg`
%% package (Overleaf runs Inkscape automatically). If your toolchain lacks
%% Inkscape, convert once:  for f in figures/*.svg; do inkscape "$f" --export-filename="${f%.svg}.pdf"; done
%% and replace \includesvg{...} with \includegraphics{...}.

\documentclass[journal,onecolumn,11pt]{IEEEtran}

\usepackage{amsmath,amssymb,amsthm}
\usepackage{graphicx}
\usepackage{svg}
\usepackage{booktabs}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{url}
\usepackage[hidelinks]{hyperref}
\usepackage{xcolor}
\usepackage{textcomp}

\setsvg{inkscapepath=svgsubdir}

\newtheorem{proposition}{Proposition}
\newtheorem{definition}{Definition}
\newtheorem{theorem}{Theorem}

\DeclareMathOperator{\softplus}{softplus}
\DeclareMathOperator{\silu}{SiLU}
\DeclareMathOperator{\rmsnorm}{RMSNorm}
\DeclareMathOperator{\sgn}{sgn}
\newcommand{\Abar}{\bar{A}}
\newcommand{\Bbar}{\bar{B}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}

\begin{document}

\title{Evermind: A Self-Updating, On-Device Cognitive Architecture\\
Unifying Selective State-Space Generation, Write-Through\\
Knowledge, and Trainable Affect}

\author{Sean~Hogg%
\thanks{Manuscript prepared 2026. The author is with Builderforce.ai.
Correspondence: \texttt{seanhogg@gmail.com}.}%
\thanks{Reference implementation: the open \texttt{builderforce-memory} package
family (\texttt{memory-engine}, \texttt{memory}, \texttt{memory-mcp}),
version~2026.6.35. All equations in this paper are grounded in that source;
file/line citations are given inline.}}

\markboth{Manuscript --- Builderforce.ai Technical Report, 2026}%
{Hogg: Evermind --- A Self-Updating On-Device Cognitive Architecture}

\maketitle

\begin{abstract}
Frontier large language models (LLMs) are \emph{frozen}: their parametric
knowledge is fixed at a training cutoff, updates require an expensive retrain
cycle, and bolt-on retrieval grows an append-only store in which stale and
current facts accumulate under distinct keys until reconciled by hand. We
present \textbf{Evermind}, a cognitive architecture that treats currency, not
scale, as the primary design axis. Evermind is organized as three cooperating
layers that deliberately mirror a coarse neuro-functional decomposition: a
\emph{cortex} --- a shared-expert hybrid selective state-space model (SSM) that
performs language generation with linear-time recurrence and supports on-device
gradient updates; a \emph{hippocampus} --- a write-through knowledge memory
governed by a reconciliation operator that \emph{replaces beliefs on write}
rather than appending them; and a \emph{limbic} layer --- a small trainable
recurrent affect head that modulates generation. We formalize each layer.
For the cortex we give the selective-scan recurrence, its zero-order-hold and
exponential-trapezoidal discretizations, and the structured-state-space-duality
and complex-valued variants, and we prove that the recurrence is a monoid scan
admitting an $O(L\log L)$-span parallel evaluation. For the hippocampus we define
\emph{Write-Through Cognition}, prove that it maintains a single-incumbent
invariant (no reconciliation backlog), and show that a version-token recall
cache yields $O(1)$ global invalidation on every belief replacement. The system
runs entirely on WebGPU with zero runtime dependencies and exports to portable
formats (safetensors, ONNX, GGUF, Hugging Face). We are explicit about
validation status: the architecture, kernels, reconciliation algorithm,
benchmarking harness, and export pipeline are implemented and tested (ONNX logit
parity $<\!10^{-5}$ against the reference forward pass); the comparative
\emph{currency}, \emph{footprint}, and \emph{ownership} theses against frozen
LLMs are stated as falsifiable hypotheses together with a measurement protocol
whose language-model metrics --- held-out perplexity, bits-per-token,
next-token accuracy, throughput, and pairwise model A/B --- are now computed by
a shipped, on-device benchmarking harness; the comparative results themselves
are not yet empirically established. We invite replication and adversarial
review.
\end{abstract}

\begin{IEEEkeywords}
State-space models, Mamba, selective scan, continual learning, retrieval-augmented
generation, write-through cache, on-device inference, WebGPU, knowledge editing,
affective computing.
\end{IEEEkeywords}

\section{Introduction}
\IEEEPARstart{T}{he} dominant paradigm in language modeling couples a very large
Transformer~\cite{vaswani2017} with a fixed training corpus. This design buys
breadth of capability at the cost of \emph{temporal rigidity}: the model's beliefs
are crystallized at a cutoff date, and the only sanctioned route to a new belief
is a new training run. Retrieval-augmented generation (RAG)~\cite{lewis2020}
mitigates the symptom by attaching an external store, but conventional stores are
\emph{append-only}: a corrected fact is added \emph{alongside} the fact it
corrects, and the contradiction is deferred to retrieval-time ranking or to a
manual reconciliation job. The model therefore never truly \emph{learns}; it
accumulates.

This paper asks a different question. Instead of ``how do we make the largest
possible frozen model?'', we ask ``what is the smallest coherent system whose
\emph{knowledge is always current by construction}, that \emph{owns} its own
generation, and that \emph{fits} inside the runtime where the work happens?''
Our answer, \textbf{Evermind}, is built on three commitments:

\begin{enumerate}
\item \textbf{Linear-time, trainable generation.} The generator is a selective
state-space model~\cite{gu2023mamba,dao2024mamba2}, not an attention stack. Its
recurrence is linear in sequence length and --- crucially --- it is cheap enough
to update \emph{online}, on the same device that serves it.
\item \textbf{Write-through knowledge.} Knowledge update is defined as
\emph{replacement}, not accumulation. A belief is keyed by a stable subject
identifier; a new belief about the same subject reconciles against the incumbent
and supersedes it, so reads are always current and there is no reconciliation
backlog.
\item \textbf{On-device, dependency-free execution.} The entire stack runs on
WebGPU in the browser, on a workstation, or embedded inside an agent, with zero
runtime dependencies, and exports to portable interchange formats.
\end{enumerate}

We deliberately frame Evermind as a \emph{systems and architecture} contribution.
The thesis that an always-current small model can \emph{outperform} a frozen
frontier model on knowledge-sensitive tasks is attractive but, as of this
writing, unproven; Section~\ref{sec:eval} states it as a hypothesis and supplies
a protocol to test it rather than reporting a result we have not earned. What we
\emph{do} claim, and substantiate, is that the architecture is well-defined,
internally consistent, implemented, and that its core algorithms have the formal
properties we prove.

\textbf{Contributions.}
(i)~A formal specification of a three-layer cognitive architecture
(Fig.~\ref{fig:arch}) and its inter-layer contracts.
(ii)~A consolidated mathematical treatment of the hybrid selective-scan cortex
(S6, SSD, and complex-valued exponential-trapezoidal variants) as implemented,
with a proof that the recurrence is a monoid scan
(Proposition~\ref{prop:monoid}).
(iii)~\emph{Write-Through Cognition}: a formal belief-reconciliation operator,
a single-incumbent invariant (Theorem~\ref{thm:invariant}), and an $O(1)$
version-token cache-invalidation scheme (Proposition~\ref{prop:invalidate}).
(iv)~A perplexity-aware inference router that decides between on-device SSM
generation and optional frontier escalation, and an online distillation loop
that uses the frontier model as a teacher.
(v)~An honest validation account separating what is tested from what is
hypothesized, with a reproducible evaluation protocol and a shipped, on-device
benchmarking harness that implements its language-model metrics.

\section{Related Work}
\textbf{State-space sequence models.} Linear state-space layers~\cite{gu2022s4}
gave $O(L)$ sequence modeling via structured $A$ matrices; Mamba~\cite{gu2023mamba}
made the dynamics \emph{input-selective} (S6) and introduced a hardware-aware
parallel scan; Mamba-2~\cite{dao2024mamba2} recast the selective scan as
structured state-space duality (SSD), exposing a matrix-multiply form. Evermind's
cortex implements S6, SSD, and a complex-valued MIMO variant with
exponential-trapezoidal discretization, plus optional attention
layers~\cite{vaswani2017} in a hybrid schedule akin to Jamba/Zamba.

\textbf{Continual learning and knowledge editing.} Continual learning fights
catastrophic forgetting~\cite{kirkpatrick2017ewc}; knowledge-editing methods such
as ROME~\cite{meng2022rome} perform localized parametric edits. Evermind differs
in \emph{where} currency lives: factual currency is delegated to a write-through
symbolic memory with explicit replacement semantics, while the parametric cortex
adapts slowly via selective online distillation, sidestepping the credit-
assignment fragility of editing weights for every fact.

\textbf{Retrieval augmentation.} RAG~\cite{lewis2020} and dense
retrieval~\cite{karpukhin2020dpr} attach an external corpus; hybrid lexical/dense
fusion and rank fusion (RRF)~\cite{cormack2009rrf} and diversity reranking
(MMR)~\cite{carbonell1998mmr} are standard. Evermind uses these for recall but
adds the missing \emph{write} discipline: its store is reconciled, not appended.

\textbf{On-device and affective modeling.} WebGPU enables in-browser
acceleration; we are, to our knowledge, unusual in running SSM training (not just
inference) on WebGPU with no ML framework dependency. The limbic layer relates to
affective computing~\cite{picard1997affective}: personality is encoded as fixed
setpoints and the limbic cell supplies bounded moment-to-moment dynamics.

\section{System Overview}
\label{sec:overview}
Fig.~\ref{fig:arch} shows the architecture. A request $x$ enters an
\emph{inference router} (Section~\ref{sec:router}) that decides whether to serve
$x$ from the on-device cortex or to escalate to an optional frontier bridge. The
cortex (Section~\ref{sec:cortex}) generates language; before and during
generation it \emph{recalls} from and \emph{writes through} to the hippocampus
(Section~\ref{sec:hippocampus}); the limbic layer (Section~\ref{sec:limbic})
modulates the response. All three layers are differentiable and trainable on the
serving device.

\begin{figure*}[t]
\centering
\includesvg[width=0.92\textwidth]{figures/fig1_architecture}
\caption{Evermind's three-layer architecture. The cortex (shared-expert hybrid
SSM) generates; the hippocampus (write-through memory) supplies and absorbs
knowledge; the limbic layer modulates affect. The frontier bridge is an optional
routing target and a distillation teacher, not a runtime dependency.}
\label{fig:arch}
\end{figure*}

\textit{Notation.} $d$ is the model width (\texttt{dModel}); $D=ed$ the expanded
inner width with expansion $e$; $N$ the SSM state dimension (\texttt{dState});
$L$ the sequence length; $H$ the number of SSM heads; $K$ the causal convolution
width. Default reference configuration:
$d{=}512,\ e{=}2\Rightarrow D{=}1024,\ N{=}16,\ K{=}4,\ H{=}4,\ L_{\text{layers}}{=}8$.

\section{The Cortex: A Hybrid Selective State-Space Generator}
\label{sec:cortex}

\subsection{Selective scan (S6)}
A single SSM channel maintains a hidden state $h_t\in\R^{N}$ evolving under an
input-dependent linear recurrence. The continuous-time system
$\dot h(\tau)=A\,h(\tau)+B\,x(\tau),\ y=C\,h$ is discretized per token with a
selective step size $\Delta_t$. For numerical stability $A$ is stored in
log-space as $a=\log(-A)$ so that $A_{\text{cont}}=-\exp(a)<0$. The zero-order-hold
(ZOH) discretization gives, elementwise,
\begin{align}
\Delta_t &= \softplus(\delta_t)=\log\!\big(1+e^{\delta_t}\big),\label{eq:delta}\\
\Abar_t &= \exp\!\big(\Delta_t\,A_{\text{cont}}\big),\label{eq:Abar}\\
\Bbar_t &= \frac{\Abar_t-1}{A_{\text{cont}}}\,B_t,\label{eq:Bbar}
\end{align}
and the discrete recurrence and readout are
\begin{align}
h_t &= \Abar_t\odot h_{t-1} + \Bbar_t\,x_t,\label{eq:rec}\\
y_t &= C_t^{\top} h_t + D\,x_t,\label{eq:out}
\end{align}
where $\odot$ is elementwise (Hadamard) product and $D$ is a learned skip.
\emph{Selectivity} is that $\Delta_t,B_t,C_t$ are produced from $x_t$ by a
projection, so the dynamics depend on content. Equations
\eqref{eq:delta}--\eqref{eq:out} are exactly the kernel in
\texttt{memory-engine/src/kernels/selective\_scan.ts:5--71}.

\subsection{The block}
Fig.~\ref{fig:block} gives the block. With input $u\in\R^{d}$ and residual
stream:
\begin{align}
\tilde u &= \rmsnorm(u),\qquad \rmsnorm(u)=\frac{u}{\sqrt{\tfrac1d\sum_i u_i^2+\epsilon}}\odot\gamma,\\
[x,z] &= W_{\text{in}}\tilde u,\quad x\in\R^{D},\ z\in\R^{D},\\
x' &= \silu(\mathrm{Conv1d}_K(x)),\quad \silu(v)=v\,\sigma(v),\\
[\Delta,B,C] &= W_{x}\,x',\quad \Delta=\mathrm{dt\_proj}(\Delta),\\
y &= \mathrm{SelectiveScan}(x',\Delta,B,C,D)\ \text{via \eqref{eq:rec}--\eqref{eq:out}},\\
o &= W_{\text{out}}\big(y\odot \silu(z)\big),\qquad u' = u + o.
\end{align}
The gate $\silu(z)$ is the standard gated-SSM nonlinearity; $\rmsnorm$ uses no
mean subtraction. Per-layer parameter tensors and shapes follow
\texttt{mamba1\_block.ts:125--144}.

\begin{figure*}[t]
\centering
\includesvg[width=0.96\textwidth]{figures/fig2_block}
\caption{Hybrid SSM block (S6). The selective projection produces the
input-dependent $(\Delta,B,C)$ that drive the scan; $A$ is stored as $\log(-A)$.
A gated branch $\silu(z)$ modulates the scan output before down-projection and a
residual add.}
\label{fig:block}
\end{figure*}

\subsection{Structured state-space duality (Mamba-2)}
The SSD variant collapses $A$ to one scalar \emph{per head} and applies a
chunked scan. With $\Delta_t=\softplus(\delta_t+\delta_{\text{bias}})$ and a
per-head log rate $A_{\log}$,
\begin{equation}
\Abar_t=\exp\!\big(-\softplus(A_{\log})\,\Delta_t\big),\qquad
h_t=\Abar_t h_{t-1}+B_t x_t,
\label{eq:ssd}
\end{equation}
with readout \eqref{eq:out}. Because $\Abar_t$ is a scalar gate, a length-$L$
chunk admits the dual matrix form $Y=(\,\mathcal{L}\odot(C B^{\top})\,)X$ where
$\mathcal{L}_{ij}=\prod_{k=j+1}^{i}\Abar_k$ for $i\ge j$ and $0$ otherwise
(\texttt{kernels/ssd.ts:80,114}). Grouping of $B,C$ into $G$ groups (default
$G{=}1$) trades expressivity for memory.

\subsection{Complex-valued MIMO with ET discretization (Mamba-3)}
The Mamba-3 layer carries a complex state $h_t\in\C^{N/2}$ (stored as
interleaved real/imaginary pairs), with $A$ parameterized by a log-magnitude and
a phase, $A=\exp(\rho+i\theta)$. The exponential-trapezoidal (ET) discretization
uses the exact complex update
\begin{align}
\Abar_t&=\exp(\Delta_t\rho)\big(\cos(\Delta_t\theta)+i\sin(\Delta_t\theta)\big),\\
\Bbar_t&=(\Abar_t-1)\,A^{-1}B_t,\qquad y_t=\Re\!\big(C_t^{\top}h_t\big),
\label{eq:complex}
\end{align}
giving oscillatory eigenmodes (rotations on the unit circle scaled by
$e^{\Delta\rho}$) that a real diagonal $A$ cannot represent
(\texttt{kernels/complex\_ssd.ts:70--85}). The output takes the real part.

\subsection{The recurrence is a parallelizable monoid scan}
Linearity of \eqref{eq:rec} makes the per-channel recurrence an associative
scan, which is what permits GPU evaluation.

\begin{proposition}[Monoid scan]
\label{prop:monoid}
Define on pairs $(a,b)\in\R\times\R$ the operator
$(a_1,b_1)\circ(a_2,b_2)=(a_1a_2,\ a_1 b_2+b_1)$. Then $\circ$ is associative
with identity $(1,0)$, and the prefix
$\big(\,\cdot\,,\,h_t\big)=(a_1,b_1)\circ\cdots\circ(a_t,b_t)$ with
$a_t=\Abar_t,\ b_t=\Bbar_t x_t$ and $h_0=0$ satisfies \eqref{eq:rec}.
\end{proposition}
\begin{proof}
Associativity: for three pairs, both bracketings give first component
$a_1a_2a_3$ and second component $a_1a_2b_3+a_1b_2+b_1$; identity is immediate.
For the prefix, induct on $t$: the partial product up to $t$ equals
$(\prod_{k\le t}a_k,\ \sum_{k\le t}(\prod_{j>k}a_j)b_k)$, whose second component
is $a_t\!\sum_{k<t}(\prod_{t>j>k}a_j)b_k+b_t=a_t h_{t-1}+b_t=h_t$.
\end{proof}

\begin{proposition}[Span--work of evaluation]
\label{prop:span}
The states $\{h_t\}_{t=1}^{L}$ can be computed by a Kogge--Stone inclusive scan in
$\lceil\log_2 L\rceil$ parallel steps (span $O(\log L)$) and $O(L)$ work per
$(\text{channel},\text{state})$ pair.
\end{proposition}
\begin{proof}
Immediate from Proposition~\ref{prop:monoid} and the standard parallel-prefix
result for associative operators. The implementation tiles the time axis into
64-lane workgroups and dispatches over
$(\lceil D/8\rceil,\lceil N/8\rceil,\text{batch})$
(\texttt{selective\_scan.ts:74--146}).
\end{proof}
Fig.~\ref{fig:scan} depicts the sweep. On commodity GPUs this realizes the
$O(L)$-work, $O(\log L)$-span profile characteristic of selective SSMs, in
contrast to the $O(L^2)$ work of dense attention.

\begin{figure}[t]
\centering
\includesvg[width=0.48\textwidth]{figures/fig3_scan}
\caption{The selective recurrence as an associative prefix over pairs $(a,b)$.
$\lceil\log_2 L\rceil$ sweeps yield all prefix states.}
\label{fig:scan}
\end{figure}

\subsection{On-device training and selective fast-adaptation}
The engine ships a tape-based reverse-mode autograd
(\texttt{training/autograd.ts}); each forward op records a closure that is
replayed in reverse to accumulate gradients. The loss is token cross-entropy,
\begin{equation}
\mathcal{L}=\frac1L\sum_{t}\Big(\log\textstyle\sum_v e^{z_{t,v}}-z_{t,y_t}\Big),
\label{eq:ce}
\end{equation}
optimized with decoupled-weight-decay AdamW~\cite{loshchilov2019adamw} on the GPU
(\texttt{kernels/weight\_update.ts}):
\begin{align}
m_t&=\beta_1 m_{t-1}+(1-\beta_1)g_t,\;
v_t=\beta_2 v_{t-1}+(1-\beta_2)g_t^2,\\
\theta_t&=\theta_{t-1}(1-\eta\lambda)-\eta\,\frac{\hat m_t}{\sqrt{\hat v_t}+\epsilon},
\end{align}
with bias-corrected $\hat m_t,\hat v_t$, defaults
$\eta{=}10^{-4},\beta_1{=}0.9,\beta_2{=}0.999,\lambda{=}0.01$ and global-norm
clipping at $1.0$. \textbf{Weight-Selective Layer Adaptation (WSLA)} restricts
online updates to the selective-projection rows that emit $B$ and $C$, i.e.\ the
$2GN$ rows of $W_{\text{in}}$ that determine \emph{routing} of content into
state, freezing the bulk representation (\texttt{mamba2\_block.ts:299--309}).
This is the mechanism that makes the distillation loop
(Section~\ref{sec:distill}) cheap enough to run in a few epochs on-device.

\section{The Hippocampus: Write-Through Cognition}
\label{sec:hippocampus}
Caching keeps \emph{answers} fresh; Evermind's hippocampus keeps \emph{knowledge}
fresh. We formalize it as a write-through cache with an explicit conflict
resolver over beliefs.

\subsection{Beliefs, keys, and the store}
\begin{definition}[Belief and store]
A \emph{belief} is a pair $b=(k,c)$ with stable subject key $k\in\mathcal{K}$
(produced by a canonicalizer, not a per-write identifier) and content
$c\in\mathcal{C}$, carrying an importance $\iota(b)\in[0,1]$. A \emph{store}
$\Sigma:\mathcal{K}\rightharpoonup\mathcal{C}$ is a partial map holding at most
one incumbent content per key.
\end{definition}

\subsection{The reconciliation operator}
Every candidate belief passes through one pipeline
(Fig.~\ref{fig:writethrough}): \emph{canonicalize} $\to$ \emph{recall incumbent}
$\to$ \emph{evaluate evidence} $\to$ \emph{reconcile} $\to$ \emph{write-through}.
Let $\sigma=\Sigma(k)$ be the incumbent (possibly $\bot$) and let
$\varepsilon\in\{\textsc{supportsNew},\neg\textsc{supportsNew}\}$ be an evidence
verdict from an optional gatherer. The reconciliation operator
$V$ returns a verdict and the post-state $\Sigma'$:
\begin{equation}
V(k,c,\sigma,\varepsilon)=
\begin{cases}
\textsf{augment}, & \sigma=\bot,\ \iota\!\leftarrow\!0.6\\[2pt]
\textsf{confirm}, & \sigma=c,\ \iota\!\leftarrow\!\min(1,\iota{+}0.1)\\[2pt]
\textsf{supersede}, & \sigma\neq c,\ \varepsilon{=}\textsc{supportsNew},\\
 & \quad \iota\!\leftarrow\!\max(0.9,\iota)\\[2pt]
\textsf{reject}, & \sigma\neq c,\ \varepsilon{=}\neg\textsc{supportsNew}
\end{cases}
\label{eq:verdict}
\end{equation}
The write rule is
\begin{equation}
\Sigma'(k)=
\begin{cases}
c, & V\in\{\textsf{augment},\textsf{supersede}\}\\
\sigma, & V\in\{\textsf{confirm},\textsf{reject}\}
\end{cases}
\label{eq:writerule}
\end{equation}
This is \texttt{EvermindCognition.commit()}
(\texttt{memory/src/cognition/EvermindCognition.ts:80--116}).

\begin{figure*}[t]
\centering
\includesvg[width=0.92\textwidth]{figures/fig4_writethrough}
\caption{Write-Through Cognition. Each candidate fact is canonicalized to a
stable subject key, reconciled against the single incumbent, and written through;
\textsf{augment} and \textsf{supersede} bump a global version token that
invalidates the recall cache in $O(1)$.}
\label{fig:writethrough}
\end{figure*}

\subsection{Single-incumbent invariant}
\begin{theorem}[No reconciliation backlog]
\label{thm:invariant}
For any finite sequence of commits applied to an initially empty store via
\eqref{eq:verdict}--\eqref{eq:writerule}, at every step $\Sigma$ holds at most one
content per key, and for each key $k$ the retained content is that of the most
recent \textsf{augment}/\textsf{supersede} on $k$ (or $\bot$ if none).
Consequently no two contents are ever simultaneously held under the same key.
\end{theorem}
\begin{proof}
$\Sigma$ is a partial \emph{function}, so by construction it maps each $k$ to at
most one content. The retained-content claim is an induction on commits: the
empty store satisfies it vacuously; \eqref{eq:writerule} only ever overwrites
$\Sigma(k)$ (on \textsf{augment}/\textsf{supersede}) or leaves it unchanged (on
\textsf{confirm}/\textsf{reject}), so after each step $\Sigma(k)$ equals the
content of the last overwriting commit on $k$. No append operation exists in
\eqref{eq:writerule}, hence two contents never co-exist under one key.
\end{proof}
Theorem~\ref{thm:invariant} is the precise sense in which Evermind ``corrects in
place'': unlike an append-only RAG store, a superseded fact is \emph{gone}, not
merely outranked, so retrieval cannot resurface it.

\subsection{Version-token recall cache}
Reads are served through a read-through cache keyed by a monotone version token.
\begin{proposition}[$O(1)$ global invalidation]
\label{prop:invalidate}
Let the recall cache key be $\kappa=\langle \nu \parallel \text{topK} \parallel
q\rangle$ where $\nu\in\mathbb{N}$ is a global version counter incremented on
every \textsf{augment}/\textsf{supersede}. Then a single increment
$\nu\leftarrow\nu+1$ invalidates \emph{all} previously cached recalls in $O(1)$
time and $O(1)$ additional space, with no per-entry traversal.
\end{proposition}
\begin{proof}
After the increment, every previously stored key embeds the stale token
$\nu-1\neq\nu$, so no subsequent lookup (which embeds $\nu$) can match a stale
entry; stale entries are never read again and are reclaimed lazily. The counter
update is constant-time.
\end{proof}
This is \texttt{\_bumpVersion()} /\ the namespaced \texttt{recall()} cache
(\texttt{EvermindCognition.ts:54--145}). The cache thus never serves a recall
that predates the most recent belief replacement: \emph{reads are always
current}.

\subsection{Hybrid recall}
\label{sec:recall}
Retrieval (Fig.~\ref{fig:retrieval}) fuses a dense and a sparse ranker. Dense
similarity is cosine over $L_2$-normalized SSM embeddings,
\begin{equation}
\mathrm{sim}(q,d)=\frac{\langle q,d\rangle}{\lVert q\rVert\,\lVert d\rVert}\in[-1,1],
\label{eq:cos}
\end{equation}
(\texttt{similarity/index.ts:32--42}; a Jaccard token-overlap fallback is used
when embeddings are unavailable). Sparse ranking is Okapi BM25,
\begin{equation}
\mathrm{BM25}(q,d)=\sum_{t\in q}\mathrm{idf}(t)\,
\frac{f_{t,d}(k_1{+}1)}{f_{t,d}+k_1\!\big(1-b+b\frac{|d|}{\overline{|d|}}\big)},
\label{eq:bm25}
\end{equation}
$\mathrm{idf}(t)=\log\!\big(1+\frac{N-n_t+0.5}{n_t+0.5}\big)$, with
$k_1{=}1.5,\ b{=}0.75$ (\texttt{retrieval/bm25.ts}). The two rankings are merged by
reciprocal rank fusion,
\begin{equation}
\mathrm{RRF}(d)=\sum_{\ell}\frac{w_\ell}{k+\mathrm{rank}_\ell(d)},\quad k=60,
\label{eq:rrf}
\end{equation}
and the top candidates are diversified by maximal marginal relevance,
\begin{equation}
\mathrm{MMR}(d)=\lambda\,\mathrm{sim}(q,d)-(1-\lambda)\max_{s\in S}\mathrm{sim}(d,s),
\quad \lambda=0.7,
\label{eq:mmr}
\end{equation}
(\texttt{retrieval/fusion.ts}). Recall returns a hard-capped top-$K$ (default
$5$) with truncated content, so memory \emph{lowers} rather than inflates prompt
size --- a deliberate token-economy property of the MCP surface.

\begin{figure}[t]
\centering
\includesvg[width=0.48\textwidth]{figures/fig6_retrieval}
\caption{Hybrid recall: cosine dense ranking \eqref{eq:cos} and BM25 sparse
ranking \eqref{eq:bm25} fused by RRF \eqref{eq:rrf} and diversified by MMR
\eqref{eq:mmr}.}
\label{fig:retrieval}
\end{figure}

\section{The Limbic Layer}
\label{sec:limbic}
The limbic head (Fig.~\ref{fig:limbic}) is a small gated recurrent cell that maps
an experience embedding $x\in\R^{32}$ and an affective state $s\in\R^{8}$ to a
bounded affect delta and a scalar reward estimate. With hidden $h\in\R^{16}$ and
per-channel gate $a=\sigma(A)$,
\begin{align}
\text{pre}&=W_x x + W_s s,\\
h' &= a\odot h + (1-a)\odot\tanh(\text{pre}),\label{eq:limrec}\\
\Delta s &= \tanh(W_o h' + b_o)\in(-1,1)^8,\quad
\hat r = w_r^\top h' + b_r.
\end{align}
Equation~\eqref{eq:limrec} is a gated leaky integrator: $a$ controls how much
prior affect persists versus how much new experience is admitted
(\texttt{memory-engine/src/limbic/limbic\_model.ts:14--18}). \emph{Personality}
is encoded as fixed setpoints (the persona's baseline $s$); the limbic cell
supplies the dynamics \emph{around} those setpoints. Training minimizes a mean
squared error on observed $(\Delta s,r)$ targets with truncated BPTT(1) and AdamW
($\eta{=}0.05$) (\texttt{limbic\_trainer.ts:107--150}).

\begin{figure}[t]
\centering
\includesvg[width=0.48\textwidth]{figures/fig7_limbic}
\caption{The limbic recurrent affect cell: a gated leaky integrator producing a
bounded affect delta and a reward estimate.}
\label{fig:limbic}
\end{figure}

\section{Inference Routing}
\label{sec:router}
The router (Fig.~\ref{fig:router}) decides between on-device SSM generation and
optional frontier escalation by a cheapest-first cascade
(\texttt{router/InferenceRouter.ts:149--200}): (1) if no bridge is configured,
serve from the SSM with confidence $1$; (2) honor a fixed strategy if set;
otherwise (3) escalate on a syntactic complexity pattern (e.g.\ ``analyze'',
``compare'', ``step by step'') with confidence $0.9$; (4) escalate if
$|x|>1200$ characters (confidence $0.85$); (5) run an optional perplexity probe
and escalate if SSM perplexity exceeds a threshold $\tau{=}80$, with confidence
\begin{equation}
\min\!\big(0.95,\ 0.5+(\mathrm{ppl}-\tau)/200\big).
\label{eq:routerconf}
\end{equation}
The default terminal is the SSM. Because the most expensive test (the perplexity
probe) runs last and only when the cheap syntactic tests are inconclusive, the
expected routing cost is dominated by string predicates. Note that $\tau$ is a
\emph{tuning} threshold, not a measured benchmark.

\begin{figure}[t]
\centering
\includesvg[width=0.46\textwidth]{figures/fig5_router}
\caption{The cheapest-first routing cascade. The SSM is the default terminal;
the frontier bridge is reached only when cheap predicates or a perplexity probe
indicate the on-device model is out of distribution.}
\label{fig:router}
\end{figure}

\section{Online Distillation}
\label{sec:distill}
When the router escalates, the frontier response is not merely returned --- it is
treated as a teacher signal that adapts the cortex
(\texttt{distillation/DistillationEngine.ts:117--209}). Given prompt $x$ and
teacher output $\hat y=\mathrm{Teacher}(x)$, the student trains on the
concatenation $x\Vert\hat y$ under the language-model objective \eqref{eq:ce},
with WSLA enabled and a few epochs (default $3$). Two quality gates protect the
update: a minimum-length gate (skip degenerate teacher output) and a
\emph{maxPerplexity} gate that skips training when the SSM's perplexity on
$\hat y$ is already below threshold --- i.e.\ when the student has \emph{already
learned} the pattern. The adapted weights persist to disk/IndexedDB checkpoints,
closing an online learning loop without a separate retrain stage. We emphasize
that convergence behavior of this loop is characterized qualitatively here;
quantitative learning curves are directly measurable with the benchmarking
harness (Section~\ref{sec:eval}, H3) but are not reported at scale here.

\section{Implementation and Portability}
\label{sec:impl}
The engine (\texttt{@seanhogg/builderforce-memory-engine}) is TypeScript + WGSL
with \emph{zero runtime dependencies}; kernels are compute shaders dispatched
over storage buffers in row-major layout with 64--256-lane workgroups. The
runtime (\texttt{@seanhogg/builderforce-memory}) adds session management,
routing, distillation, and the memory store; an MCP package exposes the runtime
to any client over in-process, stdio, or HTTP transports. Checkpoints use a
compact binary container (magic \texttt{0x4D424A53}, versioned, per-layer type
tags, then concatenated \texttt{float32}/\texttt{float16} tensors). The model
exports to safetensors, ONNX, GGUF, and a complete Hugging Face repo bundle
(\texttt{memory-engine/src/export}). ONNX numerical parity against the reference
forward pass is verified separately to $<\!10^{-5}$ in logits; format round-trips
(named-tensor extraction, header/offset integrity, tokenizer mirror) are covered
by unit tests.

\section{Validation Status and Evaluation Protocol}
\label{sec:eval}
We separate, deliberately and explicitly, \emph{what is implemented and tested}
from \emph{what is hypothesized}. Conflating the two is the failure mode this
section exists to prevent.

\subsection{What is established}
Table~\ref{tab:status} summarizes. The architecture, the three SSM kernel
families, the autograd/optimizer, the reconciliation operator with its proven
invariants (Theorems/Propositions above), the routing cascade, the distillation
loop, the export pipeline, \emph{and the benchmarking harness} are implemented
and unit-tested; ONNX export matches the reference forward to $<\!10^{-5}$. The
benchmarking harness (\texttt{memory-engine/src/bench}) scores any model that
emits per-position logits over held-out token sequences --- reporting
cross-entropy, perplexity, bits-per-token, top-1/top-$k$ next-token accuracy,
and forward throughput --- and A/Bs two models with a perplexity-ranked verdict
(\texttt{compareModels}); a one-call \texttt{trainAndBenchmark} path builds a
fresh model, reserves an enforced held-out split, trains, and scores it. This
harness is exercised by 14 unit tests and is surfaced on-device in the
Builderforce Studio as a build $\to$ train $\to$ benchmark $\to$ publish step.
It is the measurement instrument for hypotheses \textbf{H2} and \textbf{H3}
below; what remains hypothesized is the comparison \emph{itself} against a
frozen frontier baseline, not the means of measuring it.

\begin{table}[t]
\caption{Validation status of Evermind components.}
\label{tab:status}
\centering
\small
\begin{tabular}{@{}lll@{}}
\toprule
Component & Status & Evidence \\
\midrule
S6 / SSD / complex kernels & implemented & forward/backward + ONNX parity $<10^{-5}$\\
Autograd + AdamW + WSLA & implemented & training tests \\
Write-Through reconciliation & implemented + proven & Thm.~\ref{thm:invariant}, Prop.~\ref{prop:invalidate} \\
Hybrid recall (cos/BM25/RRF/MMR) & implemented & retrieval tests \\
Limbic cell & implemented & MSE training harness \\
Router cascade & implemented & unit tests \\
Online distillation loop & implemented & integration path \\
Export (safetensors/ONNX/GGUF/HF) & implemented + tested & round-trip tests \\
Benchmarking harness (ppl/bpt/acc/throughput, A/B) & implemented + tested & 14 bench tests; Studio scorecard \\
\midrule
Currency vs.\ frozen LLM & \textbf{hypothesis} & protocol \S\ref{sec:protocol} \\
Quality/perplexity vs.\ frontier & \textbf{hypothesis} & protocol \S\ref{sec:protocol} \\
Distillation convergence curves & \textbf{future work} & --- \\
Affective-behavior validity & \textbf{future work} & --- \\
\bottomrule
\end{tabular}
\end{table}

\subsection{Falsifiable hypotheses}
\begin{itemize}
\item[\textbf{H1} (Currency).] On a stream of time-stamped factual updates, an
Evermind hippocampus answers post-update queries with strictly lower staleness
(fraction of answers reflecting superseded facts) than an append-only RAG
baseline of equal retrieval budget, and than a frozen LLM.
\item[\textbf{H2} (Footprint).] The end-to-end stack sustains interactive
generation on commodity WebGPU hardware within a memory budget an order of
magnitude below a frontier-class served model, at a stated quality operating
point.
\item[\textbf{H3} (Adaptation).] The WSLA online-distillation loop reduces SSM
perplexity on teacher-distribution prompts monotonically over epochs without
catastrophic degradation on a held-out general set.
\end{itemize}

\subsection{Measurement protocol}
\label{sec:protocol}
\textbf{H1}: construct a temporal knowledge benchmark of $(k, c_{\text{old}}\!\to
c_{\text{new}}, t)$ edits; after applying each edit, query both the subject and
$m$ paraphrases; report \emph{staleness rate}, \emph{contradiction rate} (two
live answers disagree), and \emph{edit latency}. Compare Evermind write-through,
append-only dense RAG, and a frozen LLM. Theorem~\ref{thm:invariant} predicts an
Evermind contradiction rate of $0$ by construction --- a directly falsifiable
claim. \textbf{H2}: report peak GPU memory, tokens/s, and time-to-first-token at
matched perplexity on WikiText-style held-out text across model widths
$d\in\{256,512,768\}$ --- the perplexity and throughput terms are produced by
the shipped harness (\texttt{benchmarkModel}; \texttt{tokensPerSecond}),
leaving only the memory probe to instrument. \textbf{H3}: report per-epoch
teacher-prompt perplexity and held-out general perplexity for WSLA vs.\ full
fine-tune --- both perplexities are direct \texttt{benchmarkModel} /
\texttt{compareModels} outputs on the WSLA-adapted and fully-fine-tuned
checkpoints. All three are reproducible from the open packages: the
language-model benchmarking harness now ships in \texttt{memory-engine} and
on-device in the Studio, so H2 and H3 reduce to running it at scale; only H1's
temporal-edit driver (the $(k,c_{\text{old}}\!\to c_{\text{new}},t)$ stream and
staleness/contradiction scoring) remains to be assembled around the existing
recall API. Until the comparisons are run, H1--H3 remain \emph{conjectures},
and we make no comparative performance claim.

\section{Discussion, Limitations, and Threats to Validity}
The most important limitation is the one Section~\ref{sec:eval} foregrounds: the
comparative advantages that motivate Evermind are \emph{architecturally
plausible and partly provable} (the zero-contradiction property follows from
Theorem~\ref{thm:invariant}) but \emph{not yet empirically measured at scale}.
Specific threats: (i)~recall quality depends on embedding quality, and the
headless path currently falls back to lexical overlap when SSM embeddings are
absent; (ii)~the canonicalizer that produces subject keys is itself a model and a
source of error --- a mis-canonicalization splits or merges subjects and can
defeat the single-incumbent invariant at the \emph{key-assignment} boundary even
though the store-level invariant holds; (iii)~WSLA's restriction to selective-
projection rows trades adaptation capacity for speed, and its sufficiency is an
empirical question; (iv)~WebGPU availability and driver variance bound
portability in practice. We regard these as the agenda, not as objections to the
architecture's coherence.

\section{Conclusion}
We presented Evermind, a three-layer cognitive architecture in which a linear-time
selective state-space cortex generates, a write-through hippocampus keeps
knowledge current by construction, and a trainable limbic layer modulates affect,
all on-device with zero runtime dependencies. We gave a unified mathematical
account of the SSM cortex, proved that its recurrence is a parallelizable monoid
scan, and formalized Write-Through Cognition with a single-incumbent invariant
and $O(1)$ cache invalidation. We have been deliberate in separating proven and
implemented properties from the performance hypotheses that remain to be tested,
and we supply a protocol to test them. We hope the formalization and the open
implementation make Evermind a useful object of study --- and an easy target for
falsification --- for the community.

\section*{Reproducibility}
All equations cite source files in the open \texttt{builderforce-memory}
package family (engine/runtime/MCP), version 2026.6.35. Figures are provided as
vector SVG. The benchmarking harness implementing the language-model metrics of
the evaluation protocol (\S\ref{sec:protocol}) ships in that package family
(\texttt{memory-engine/src/bench}) and accompanies this report.

\begin{thebibliography}{99}
\bibitem{vaswani2017} A.~Vaswani \emph{et al.}, ``Attention is all you need,''
\emph{NeurIPS}, 2017.
\bibitem{lewis2020} P.~Lewis \emph{et al.}, ``Retrieval-augmented generation for
knowledge-intensive NLP tasks,'' \emph{NeurIPS}, 2020.
\bibitem{gu2022s4} A.~Gu, K.~Goel, and C.~R\'e, ``Efficiently modeling long
sequences with structured state spaces,'' \emph{ICLR}, 2022.
\bibitem{gu2023mamba} A.~Gu and T.~Dao, ``Mamba: Linear-time sequence modeling
with selective state spaces,'' \emph{arXiv:2312.00752}, 2023.
\bibitem{dao2024mamba2} T.~Dao and A.~Gu, ``Transformers are SSMs: Generalized
models and efficient algorithms through structured state space duality,''
\emph{ICML}, 2024.
\bibitem{kirkpatrick2017ewc} J.~Kirkpatrick \emph{et al.}, ``Overcoming
catastrophic forgetting in neural networks,'' \emph{PNAS}, 2017.
\bibitem{meng2022rome} K.~Meng \emph{et al.}, ``Locating and editing factual
associations in GPT,'' \emph{NeurIPS}, 2022.
\bibitem{karpukhin2020dpr} V.~Karpukhin \emph{et al.}, ``Dense passage retrieval
for open-domain question answering,'' \emph{EMNLP}, 2020.
\bibitem{cormack2009rrf} G.~V.~Cormack, C.~L.~A.~Clarke, and S.~B\"uttcher,
``Reciprocal rank fusion outperforms Condorcet and individual rank learning
methods,'' \emph{SIGIR}, 2009.
\bibitem{carbonell1998mmr} J.~Carbonell and J.~Goldstein, ``The use of MMR,
diversity-based reranking for reordering documents and producing summaries,''
\emph{SIGIR}, 1998.
\bibitem{picard1997affective} R.~W.~Picard, \emph{Affective Computing}. MIT
Press, 1997.
\bibitem{loshchilov2019adamw} I.~Loshchilov and F.~Hutter, ``Decoupled weight
decay regularization,'' \emph{ICLR}, 2019.
\end{thebibliography}

\end{document}