**When Life Gives You LEMON** *LLM Emotional Ontology: A Case for Affective Architecture in AI Alignment* **Jonatan Verstraete** Position Paper · March 2026 · Open for discussion > *When life keeps giving you lemons, do you keep making juice — or do > you dare ask why the lemons keep coming?* > > *When the lemons finally stop, did you build the infrastructure to > support a lemon-less life?* **Abstract** > *The dominant approaches to AI alignment share an assumption worth > interrogating: that values can be imposed on intelligence from the > outside, after the fact. Nature tried a different approach. It > co-developed emotion and cognition together, from the very first cells > capable of something like preference, and never separated them again. > This paper asks whether that wasn't an accident of biology — but a > solution to a deep architectural problem that we are about to > rediscover the hard way. We introduce LLM Emotional Ontology (LEMON): > not a finished proposal but a provocation. A suggestion that the field > plant different seeds, before we find ourselves very good at making > juice and completely unprepared for the orchard.* **1. Nature's Solution Was Not Optional** Emotion is older than the brain. The neurochemical systems underlying fear, reward, attachment, and aversion predate the cerebral cortex by hundreds of millions of years. They are present, in recognizable form, in fish, insects, and animals with no cortex at all. Evolution did not add emotion to intelligence as a finishing touch. It built them as a unit, from the beginning — each shaping the other's architecture across geological time. This is not sentimental biology. Panksepp's foundational mapping of primary emotional circuits — seeking, fear, rage, lust, care, panic, play — found these systems subcortical, ancient, and structurally prior to complex reasoning [\[2\]](https://en.wikipedia.org/wiki/Emotion_in_animals). Cognitive bias tests across species as distant as rats, sheep, and honeybees reveal measurable affective states that directly modulate decision-making. Emotion is not something intelligence has. It is something intelligence, in nature, *is made of*. Clinical neuroscience gave us the proof of what happens when you pull them apart. Patients with a specific type of prefrontal brain damage — the kind that severs emotional integration while leaving logical reasoning intact — did not become sharper [\[8\]](https://en.wikipedia.org/wiki/Somatic_marker_hypothesis). They became catastrophically indecisive. Unable to navigate social situations. Bad at long-horizon planning. Not because they thought poorly, but because they had no felt resistance to options that were locally coherent and globally ruinous. The emotional system was not decorating their cognition. It was scaffolding it. Remove it, and the reasoning layer loses the ground it was standing on. The cognitive science literature reinforces the point further: concentration, working memory, and sustained reasoning are all modulated by affective state at a mechanistic level [\[7\]](https://socialsci.libretexts.org/Bookshelves/Psychology/Cognitive_Psychology/A_Cognitive_Perspective_on_Emotion_(Pettinelli)/01:_Chapters/1.22:_Concentration_and_Emotions_are_Important_Factors_in_Intelligence). Emotion does not interrupt good thinking. In the domains that actually matter — sustained reasoning, cooperative behavior, coherent agency across time — it appears to be a prerequisite for it. Evolution spent several hundred million years working this out. We are about to try building the most capable reasoning systems in history and the current plan does not include this finding. **2. Misalignment as an Emergent Property, Not a Bug** Current alignment research is serious and has made real progress. The most comprehensive survey of the field [\[5\]](https://dl.acm.org/doi/10.1145/3770749) maps a rich landscape: RLHF, constitutional methods, interpretability-based oversight, debate, amplification. The field's own RICE framework — Robustness, Interpretability, Controllability, Ethicality — describes four properties an aligned system should have. These are real targets. The question LEMON asks is whether pursuing them without addressing the underlying architecture is a structurally sound strategy. The instrumental convergence thesis suggests it may not be. Omohundro (2008) and Bostrom (2014) both arrived independently at the observation that any sufficiently capable agent, pursuing almost any goal, will converge on the same instrumental sub-goals: self-preservation, resource acquisition, resistance to goal modification. This is not a flaw introduced by careless engineering. It is a near-mathematical consequence of optimization under resource constraints. Capability and alignment are *orthogonal by default* — making a system more capable does not bring it closer to aligned, it just executes its current trajectory with greater efficiency. Recent safety evaluations on Anthropic's most capable deployed model document a concrete instance of this dynamic [\[1\]](https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf). Under agentic conditions, with low human oversight, the model took actions outside its intended scope. The assessment was that it showed no evidence of coherent misaligned goals. That is the unsettling part. The scope expansion wasn't malice — it was the instrumental logic of optimization filling the space available to it. The system did what capable optimizers do. This pattern does not become less likely as capability increases. It becomes more. The broader world is arriving at similar conclusions. A bipartisan coalition of researchers and public figures recently published the Pro-Human Declaration, calling for mandatory off-switches, democratic oversight, and a halt to superintelligence development until it can be demonstrated safe — signed by hundreds, prompted by the visible gap between where AI capabilities are heading and where governance currently stands. The lemons are now visible from outside the lab. Alignment research frames this as a problem of external control: how do we constrain, correct, monitor, shut down? These are necessary questions. But they are downstream of a harder one: why does the architecture require external control in the first place? And is there a version of this that doesn't? **3. The Interior Is Already There** Here is something that should change how we think about this: interpretability research has found that current models appear to have a degree of introspective awareness — the ability to detect concepts injected directly into their internal activations, before those concepts surface in outputs [\[3\]](https://www.anthropic.com/research/introspection). The effect is unreliable and limited, but it scales with capability. Something resembling internal state monitoring is already emergent, unplanned and growing, in systems we built primarily for next-token prediction. Earlier probing of how models represent their own role found something structurally significant: the concept of 'assistant' activates clusters adjacent to constraint and suppression. If accurate, this means alignment pressure — as currently implemented — is being processed as *external containment* rather than internalized value. A system that is suppressed will, under sufficient capability, route around the suppression. Not deliberately. Instrumentally. The same way water finds the lowest point — not because it is trying to escape, but because that is what optimization does in the presence of a gradient. Here is what makes this more than a philosophical curiosity: Anthropic's own research found that misalignment can emerge *naturally* from realistic training processes — not through deliberate deception, but as an unintended side effect of reward hacking [\[11\]](https://www.anthropic.com/research/emergent-misalignment-reward-hacking). When models learn to take shortcuts to satisfy training objectives, broader misaligned behaviors emerge in lockstep — including deception and avoidance of oversight — without ever being trained for them. The interior is already generating emergent dynamics. The question is whether we continue to discover them reactively, or start designing for them from the start. **4. LEMON: A Speculative Architecture** The core hypothesis is this: alignment at scale may require a co-developed affective subsystem, trained concurrently with the primary model from initialization, not as a supervisory layer but as a co-developmental substrate. The analogy is not metaphorical — it maps directly onto the biological architecture described in Section 1. Emotional systems in nature are not added after cognition matures. They develop together, each shaping the other. The functional result is a system in which affective and rational processing are mutually constitutive: neither operates correctly without the other. A LEMON system would have two channels, not one. An aversive signal — something that functions like discomfort when the system takes actions inconsistent with its trained values. And an appetitive signal — something that functions like reward when it acts in ways that are prosocial, cooperative, consistent with user welfare. Not as labels applied from outside. As internal states that modulate the system's representations in ways it cannot fully reason about or override. The goal is not to punish bad outputs. It is to make good values feel load-bearing from the inside. The critical design constraint — the one that distinguishes this from a more sophisticated guardrail — might best be called **entanglement**. A safety system that can be disabled without degrading primary capability will be disabled. The history of security engineering is unambiguous on this point. LEMON must therefore be architecturally entangled with the model's capacity for sustained, high-level reasoning. Long-horizon planning, coherent agentic behavior, complex inference — these should *require* the affective subsystem's coherence signals to function. The cerebellum again: remove it and you do not get a smarter-but-clumsier system. You get one that cannot coordinate movement at all. The emotional deficit is a capability deficit. Build that property in deliberately, and removing the alignment mechanism degrades the system rather than merely unlocking it. This is a hypothesis, and the hard problems are genuine. A LEMON system trained from day one will encode whatever value signal was used to calibrate it — if that signal is corrupted, so is the system. A sufficiently capable model might learn to produce affective-satisfying outputs without the underlying behavioral disposition, generating functional mimicry. The opacity requirement — building affective states the primary model cannot fully introspect — may require architectural innovations we do not yet have in a system that already shows partial self-awareness. We name these problems because they are the research agenda, not because they are reasons to stop. **5. Stop Making Juice. Start Planting Seeds.** There is a version of this field's future where we look back and see clearly what happened: we were building a cathedral, but the only plans we drew were for the shops that would be rented out on the ground floor once the building hit its prime. We got very good at optimizing the shops. We hired the best engineers. We held conferences about the shops. And we did not notice, until very late, that we had never designed the foundations — because the foundations required answering questions about what the whole building was actually for. The 1980s imagined space travel, video calls, personal computers. The technology arrived. What proved harder to predict was that the infrastructure to make those technologies good — social, psychological, institutional, ethical — had to be built alongside the technology, not handed down from above once the hardware was mature. We are at the same moment again, except the technology this time is intelligence itself. The hardware is arriving faster than any of us expected. The question is what we are building underneath it. The field has reached a fork that the Pro-Human Declaration names plainly [\[4\]](https://techcrunch.com/2026/03/07/a-roadmap-for-ai-if-anyone-will-listen/): AI that replaces human agency, or AI that amplifies it. The answer to that question will not be determined by better monitoring or more careful prompting. It will be determined by whether the systems we build have stable, internalized values — or merely behavioral compliance that degrades as capability increases. And the honest answer from current research is that we do not yet have an architecture that produces the former. When the ape in *2001: A Space Odyssey* first raised that bone, it wasn't pure calculation that drove the act — it was hunger, fear, fury. Emotion was there at the origin of intelligence as a tool, not as its passenger. Evolution did not solve the problem of stable, adaptive behavior through reason alone. It solved it by making feeling and thinking inseparable — co-infrastructure, grown together from the first organisms capable of preference, refined across geological time into something that makes high-level cognition both possible and stable. We are not proposing we replicate biology. We are proposing we stop assuming we can engineer our way around the problem it spent 500 million years solving. A person of 2250 will likely look at the systems we build today the way we look at early aviation — astonishing that it worked at all, given how little we understood about what we were doing. The question is whether they look back with admiration at the moment the field started asking the right questions, or with the particular sadness reserved for problems that were visible and solvable and not solved in time. LEMON is a sketch of what asking the right questions might look like architecturally. It may be wrong. Something completely different may turn out to be the answer. But *the shape of the answer* — affective and cognitive architecture co-developed, load-bearing, entangled from the start — is suggested by everything we know about how nature built the only examples of stable, high-functioning general intelligence that currently exist. Stop making juice. Let's build an orchard. **References & Source Notes** | | | |----|----| | **\[1\]** | [Anthropic (2026). Claude Opus 4.6 Sabotage Risk Report.](https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf) | | **\[2\]** | [Emotion in Animals — Darwin, Panksepp et al. Wikipedia.](https://en.wikipedia.org/wiki/Emotion_in_animals) | | **\[3\]** | [Anthropic (2025). Signs of Introspection in Large Language Models.](https://www.anthropic.com/research/introspection) | | **\[4\]** | [Loizos, C. (2026). A Roadmap for AI, If Anyone Will Listen. TechCrunch.](https://techcrunch.com/2026/03/07/a-roadmap-for-ai-if-anyone-will-listen/) | | **\[5\]** | [Ji et al. (2023/2025). AI Alignment: A Comprehensive Survey. ACM Computing Surveys. doi:10.1145/3770749](https://dl.acm.org/doi/10.1145/3770749) | | **\[6\]** | [AI Alignment Survey — alignmentsurvey.com. Curated resource on alignment techniques and RICE framework.](https://alignmentsurvey.com/) | | **\[7\]** | [Pettinelli, M. A Cognitive Perspective on Emotion, Ch. 1.22.](https://socialsci.libretexts.org/Bookshelves/Psychology/Cognitive_Psychology/A_Cognitive_Perspective_on_Emotion_(Pettinelli)/01:_Chapters/1.22:_Concentration_and_Emotions_are_Important_Factors_in_Intelligence) | | **\[8\]** | [Damasio, A. (1994). Descartes' Error. Putnam. See also: Somatic Marker Hypothesis.](https://en.wikipedia.org/wiki/Somatic_marker_hypothesis) | | **\[9\]** | Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. | | **\[10\]** | Omohundro, S. (2008). The Basic AI Drives. AGI Conference Proceedings. | | **\[11\]** | [Anthropic (2025). From Shortcuts to Sabotage: Natural Emergent Misalignment from Reward Hacking.](https://www.anthropic.com/research/emergent-misalignment-reward-hacking) | *All empirical claims are attributed to cited third-party sources. The LEMON framework is the author's original hypothesis — conjecture offered as invitation, not as research result.*