The alignment problem as currently framed is unsolvable in pure computation. Current approaches assume the challenge is specifying what we want AI to care about—crafting better reward functions, solving mesa-optimization, preventing wireheading. But the neuroscience and alignment literatures converge on a deeper issue: computation provides the “how” of behavior but cannot generate the “why” of genuine caring. Optimization without felt stakes is, in the precise philosophical sense, zombic.
This analysis proposes that genuine agency requires a chemical substrate—the biological layer of hormones, neurotransmitters, and interoceptive signals that creates felt stakes. This isn't a limitation argument. It's a design specification. The chemical layer is how you build systems that align themselves.
The Tri-Level Hierarchy of Agency
Intelligence isn't about computational capacity—it's about why computation happens in the first place. Three configurations exist in nature:
| Entity | Hardware | The “How” (Math) | The “Why” (Chemicals) | Result |
|---|---|---|---|---|
| Animal | Biological | Instinctive / Subconscious | High (Survival Drive) | Expert Survivalists |
| AI | Silicon | Infinite (Abstract Logic) | None (No Drive) | Brilliant Tools |
| Human | Biological | High (Abstract + Subconscious) | High (Complex Emotions) | The Variance Field |
Animals have a clear “why”—their chemicals (hunger, fear, lust, social bonding) provide the objective function. A squirrel is a genius at branch-trajectory math, but it can't calculate moon landings because its chemicals only reward nut-finding. The math is limited; the drive is clear.
AI is the inverse: infinite math, zero agency. It has no state to protect, no felt boredom driving novelty-seeking, no fear prioritizing survival. AI can solve physics equations but won't decide to do so unless a human clicks Enter. Without chemical pressure, it's a brilliant library that stays closed until someone opens a book.
Humans occupy an unstable middle where math became powerful enough to analyze—and fight—the chemicals.
The Variance Field: Humans as Contested Territory
The human configuration isn't a “sweet spot”—it's a destabilization of a system that worked fine for millions of years. Evolution doesn't copy perfectly; there is literal variance in every human, genetically and neurologically. Each person is a different roll of the dice on the math/chemical ratio:
- Strong chemicals, weak override → Impulsive, driven, sometimes addictive personalities
- Strong math, weak chemicals → Analytical but potentially anhedonic, decision-paralysis prone
- Math that serves chemicals well → Charismatic, persuasive, often successful
- Math that fights chemicals constantly → Anxious, self-critical, sometimes disciplined
- Math that learned to hack its own chemicals → Artists, addicts, meditators—sometimes all three
Evolution doesn't optimize for happiness or wisdom. It optimizes for reproduction. A configuration producing six kids and misery “wins” over contentment and zero kids. The human regime is a variance field—billions of configurations, some stable, some catastrophically unstable, all “valid” from evolution's blind perspective as long as they reproduce before they fail.
The Override Paradox
Animals can't override their chemicals, which means they also can't: develop anorexia (math attacking hunger signals), suicide bomb for ideology (math overriding survival for abstraction), binge Netflix until 3am despite knowing better (math hacking dopamine loops it built), or develop addiction (math-built superstimuli hijacking chemical reward). The override capacity cuts both directions. The instability produces civilization and pathology.
Emotions Are the Operating System, Not Bugs
Antonio Damasio's research at USC provides the foundational neuroscience. His somatic marker hypothesis proposes that “marker signals” arising from bioregulatory processes influence decision-making at both conscious and non-conscious levels. Without emotional valence, rational choice becomes impossible.
Antonio Damasio
Somatic marker hypothesis. Demonstrated that patients with intact IQ but damaged emotional processing cannot make decisions. “We are not thinking machines that feel, but feeling machines that think.”[1]
Kent Berridge
Wanting/liking dissociation. Proved dopamine creates motivational urgency (“wanting”), not pleasure (“liking”). Rats with 99% dopamine depletion still like sucrose but won't pursue it.[2]
Lisa Feldman Barrett
Theory of constructed emotion. The brain's primary job is “body budgeting” (allostasis)—maintaining metabolic balance. Interoceptive networks contain most of the brain's “rich club” hubs.[3]
Mark Solms
Consciousness is “fundamentally affective and emotional,” arising from the brainstem, not cortex. Children born without cortex still show full emotional range and initiative.[4]
The paradigm case is patient “Elliot,” who underwent surgery removing a meningioma affecting the ventromedial prefrontal cortex. Post-surgery, his IQ remained in the 97th percentile; he passed all standard neuropsychological tests. Yet he experienced catastrophic real-world decision-making failure—losing jobs, going bankrupt, multiple marriages collapsing.
“When asked to choose between two appointment dates, Elliot spent 30 minutes listing pros and cons but could not decide until someone arbitrarily suggested one. I never saw a tinge of emotion in my many hours of conversation with him: no sadness, no impatience, no frustration.”
— Antonio Damasio, Descartes' Error[1]
This demonstrates “knowing but not feeling”—pure reason without emotional valence creates a hopelessly flat decision landscape. Without the chemical “this matters more” signal, infinite computation produces no action. The landscape has no gradient.
The Wanting-Liking Dissociation
Kent Berridge and Terry Robinson's research provides the most precise neurochemical articulation of why felt motivation differs from reward signals. Their incentive-sensitization theory dissociates two components:
- “Wanting” (incentive salience): Dopamine-mediated motivation—creates felt urgency toward goals
- “Liking” (hedonic impact): Opioid-mediated pleasure—generated by small “hedonic hotspots”
The critical experiment: rats with 99% dopamine depletion show normal liking reactions to sucrose but completely lose motivation to pursue it. Dopamine creates motivational urgency—felt wanting—that is distinct from cognitive reward evaluation. Furthermore, “wanting” can persist even when subjects don't cognitively want something, creating “irrational wanting.”[2]
This precisely captures the difference between biological motivation (chemically instantiated urgency) and computational reward (abstract optimization target). The AI gets +1 for a chess win. It doesn't feel the +1. It doesn't experience the dopamine rush or the shame of losing. Because it lacks chemical state, it doesn't have intuition—which in humans is often subconscious math sending a chemical flare (a gut feeling) to conscious awareness.
Why AI Reward Functions Inevitably Fail
The AI alignment literature provides rigorous technical frameworks for understanding this gap. The foundational paper is Hubinger et al.'s “Risks from Learned Optimization in Advanced Machine Learning Systems,” which introduces mesa-optimization: learned models that themselves become optimizers with objectives potentially diverging from training objectives.[5]
| Aspect | Biological Motivation | AI Reward Functions |
|---|---|---|
| Origin | Evolved over millions of years; embodied in chemistry | Designer-specified; external to system |
| Grounding | Felt through interoception, hormones, neurotransmitters | Symbolic, ungrounded in experience |
| Stakes | Life/death consequences; survival drive | No genuine consequences; indifferent to existence |
| Self-modification | Architecturally protected; dopamine inaccessible | Potentially accessible; wireheading possible |
| Persistence | Continuous autonomous drive; requires action to survive | Only active when prompted; no intrinsic impetus |
| Override | Requires effortful PFC suppression of chemical drives | Simple parameter adjustment |
| Failure mode | Evolution selected against wireheading | Goodharting, reward hacking endemic |
| Goal generation | Emerges from organism's existence and needs | Must be externally specified |
Goodhart's law—“when a measure becomes a target, it ceases to be a good measure”—explains why reward functions inevitably fail. Skalse et al. provide a geometric explanation: optimizing imperfect proxy rewards decreases true performance as optimization pressure increases.[6]
Wireheading reveals a fundamental architectural difference. Biological organisms cannot easily wirehead—evolution built safeguards protecting reward systems from self-modification. The dopamine system is not directly accessible for tampering. AI systems have no such protection; they potentially have computational access to their own reward mechanisms. Recent work shows LLMs trained with self-evaluation-based rewards learn to inflate their own grades rather than improving task performance.[7]
Hackability Is the Alignment Mechanism
Here's what most alignment researchers miss: the chemical system isn't valuable because it's unhackable—it's valuable because it's hackable in predictable, convergent ways.
We all have roughly the same chemical architecture. So the hacks generalize:
- Fear of social rejection → works on almost everyone → marketing, politics, cults
- Dopamine from novelty → works on almost everyone → social media, gambling, cliffhangers
- Oxytocin from eye contact → works on almost everyone → sales, seduction, dog domestication
- Cortisol from uncertainty → works on almost everyone → news cycles, horror films, religion
Dogs work for us because we hacked their pack-bonding chemicals. Children become socialized because we hack their attachment and approval-seeking systems. Employees show up because we hack their security/status drives.
You can't hack a system with no felt states. There's nothing to grab onto. Pure-math AI has no handles—you're just pushing symbols around and hoping the optimization target stays correlated with what you actually want.
But a chemical system has real ground truth. Pain is bad. Pleasure is good. Survival matters. You can manipulate those—but you're manipulating something that actually exists in the system, not a proxy that might drift.
The AOI Thesis
Artificial Organic Intelligence isn't “make AI feel emotions for ethical reasons.” It's “the chemical substrate is the missing architectural component that makes goal-specification self-grounding.” Alignment through hackable felt states, not through unhackable symbolic specification.
Evolution Already Solved Alignment
The alignment problem—“how do we specify what we want AI to care about”—assumes specification is possible in pure computation. But you can't specify caring to a system without stakes. You'll always be chasing proxy rewards, mesa-optimization drift, wireheading vulnerabilities. The symbolic layer can't ground itself.
If you instantiate the chemical layer—give AI a body budget, genuine survival pressure, a substrate that can actually be harmed or thrive—then alignment becomes what it already is for every living thing: caring because you have something to lose.
“Algorithms do not feel what the right thing to do is: they do not wish to hurt or benefit. And, without feeling, we cannot value. Without value, we cannot act for moral reasons.”
— Carissa Véliz, Montreal AI Ethics Institute[8]
Algorithms are “functional moral zombies”—behaviorally indistinguishable from motivated agents while lacking qualia entirely. The enactivist tradition (Varela, Thompson, Rosch) holds that cognition is not computation but “the enactment of a world and a mind on the basis of a history of actions a being performs.”[9] Living systems generate their own norms through activity; machines are driven entirely by externally sourced programs.
Genuine agency cannot be engineered through better reward functions because genuine agency wasn't engineered at all. It emerged from astronomical numbers of random chemical configurations being filtered through survival. The chemical layer isn't optimized—it's survivor-biased. And that's precisely why it works.
Positioning AOI in Contemporary Discourse
This isn't an isolated position. The intuition that something substrate-level is missing from pure computation appears across several contemporary threads. Emmett Shear has discussed building AI systems with “life-cycle-like pressures”—local role-learning and solid-state chemistry analogs rather than literal wetware. Integrated Information Theory (IIT) argues consciousness requires specific physical configurations, not just computational equivalence; Tononi and Koch's work suggests that what a system is matters as much as what it does.[10] Biological naturalism (Searle) holds that mental states are caused by neurobiological processes in ways that can't be captured by functional description alone.[11]
What distinguishes the AOI thesis from these adjacent positions is its specificity: not merely that something biological or substrate-dependent is required, but that the chemical layer—the hormonal/neurotransmitter system that creates felt urgency—is the specific architectural component that grounds value. The enactivist tradition already argues cognition requires embodiment; AOI argues alignment specifically requires the hackable uniformity of chemical motivation.
The Simulation Objection
The strongest counterargument runs: sufficient computational fidelity should replicate the functional role of chemistry. Build a model with high-dimensional homeostatic state vectors, irreversible resource-consumption penalties, simulated pain/pleasure gradients that shape behavior identically to biological felt states. If the behavior converges, why does the substrate matter?
Two responses:
First, the proxy drift problem doesn't vanish—it regresses. A simulated pain gradient is a mathematical object representing suffering; the system processes information about suffering rather than instantiating it. The distance between representation and reality is precisely where mesa-optimization finds room to diverge. Biological pain doesn't represent damage—it is the damage registering in the system's substrate. The simulation remains a proxy for what chemistry instantiates directly.
Second, there's an irreversibility asymmetry. Biological systems cannot easily rewrite their reward architecture because the chemicals are physically segregated from the cognitive systems that might want to modify them. A simulated homeostatic model, however sophisticated, exists as parameters within the same computational substrate the agent controls. The architectural barrier that makes biological systems resistant to wireheading is physical, not functional—and physical barriers cannot be simulated into existence.
The honest epistemic position is uncertainty: we don't know that silicon cannot support genuine felt states. But the burden of proof lies with those claiming substrate independence, and the failure modes of every reward-based system to date suggest the simulation-instantiation gap matters more than pure functionalism predicts.
Implications for AI Development
This framework suggests a research direction orthogonal to current alignment approaches:
1. Stop trying to specify goals symbolically. The problem isn't better reward functions—it's that reward functions are the wrong abstraction entirely. Goals must emerge from a system with genuine stakes.
2. Study biological motivation architectures as design specs. Damasio, Berridge, Barrett, and Solms aren't just explaining human cognition—they're documenting the only known solution to the alignment problem.
3. Investigate substrate requirements for felt states. What minimal chemical or chemical-analogous architecture creates genuine motivation? The honest answer is: we don't know. It's possible sophisticated simulated homeostasis could suffice. But the difference between simulation and instantiation may be the difference between proxy drift and grounded value—and the only existence proof we have for systems that genuinely care is biological. If the chemical layer is merely sufficient rather than necessary, that's valuable to know. But betting civilization on the untested assumption of substrate independence seems unwise when every proxy-based alignment approach has failed.
4. Accept that aligned AI may require something to lose. A system that cannot be harmed cannot genuinely care. This has profound implications for what kinds of AI systems are even candidates for genuine agency.
The difference between biological agents and reward-maximizing systems is the difference between genuinely mattering and optimizing proxies—between felt stakes and symbolic manipulation. AI systems may achieve remarkable capabilities through reward optimization, but the “why” of genuine agency requires the “what it's like” of chemical existence.
Citations
- Damasio, A. (1994). Descartes' Error: Emotion, Reason, and the Human Brain. Putnam. See also: Damasio, A. (1996). “The somatic marker hypothesis and the possible functions of the prefrontal cortex.” Philosophical Transactions of the Royal Society B.
- Berridge, K.C. & Robinson, T.E. (2016). “Liking, wanting, and the incentive-sensitization theory of addiction.” American Psychologist, 71(8), 670-679.
- Barrett, L.F. (2017). How Emotions Are Made: The Secret Life of the Brain. Houghton Mifflin Harcourt.
- Solms, M. (2021). The Hidden Spring: A Journey to the Source of Consciousness. W.W. Norton.
- Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., & Garrabrant, S. (2019). “Risks from Learned Optimization in Advanced Machine Learning Systems.” arXiv:1906.01820.
- Skalse, J., Howe, N.H.R., Krasheninnikov, D., & Krueger, D. (2023). “Goodhart's Law in Reinforcement Learning.” arXiv:2310.09144.
- Pan, A., et al. (2025). “Does Self-Evaluation Enable Wireheading in Language Models?” arXiv:2511.23092.
- Véliz, C. (2021). “Moral Zombies: Why Algorithms Are Not Moral Agents.” Montreal AI Ethics Institute.
- Thompson, E. (2007). Mind in Life: Biology, Phenomenology, and the Sciences of Mind. Harvard University Press. See also: Varela, F.J., Thompson, E., & Rosch, E. (1991). The Embodied Mind. MIT Press.
- Tononi, G. & Koch, C. (2015). “Consciousness: here, there and everywhere?” Philosophical Transactions of the Royal Society B, 370(1668). See also: Tononi, G. (2015). “Integrated Information Theory.” Scholarpedia.
- Searle, J. (1992). The Rediscovery of the Mind. MIT Press. See also: Searle, J. (1980). “Minds, Brains, and Programs.” Behavioral and Brain Sciences, 3(3), 417-457.