Ryan Mulligan

The Work

Most language models are black boxes. You can probe them, steer them from the outside, and reverse-engineer what they seem to be doing. For the past year I've been working on the complement to that: building explicit cognitive machinery into a model so the internal state isn't something you recover after the fact but something that exists, can be read, and drives behavior in real time.

The system is called Lilly. It runs on Qwen3-8B with an 8-dimensional emotional field grounded in Plutchik's primary emotions (joy, trust, fear, surprise, sadness, disgust, anger, anticipation) that drives activation steering during inference via hooks into the model's MLP layers. The emotional state isn't computed after generation — it shapes generation as it happens.

The open question at the center of this work: when a model's behavior is being shaped by its own past emotional responses to similar situations, does it have any introspective access to that fact? Building from the inside creates a different kind of test for that question than probing from the outside.

The steering substrate is a trained sparse autoencoder with 163,840 features extracted from Qwen3-8B's MLP layers via TransformerLens. An evolutionary system called Evalatis manages a population of steering vectors that crystallize from the model's activation history and compete through a selection process weighted by affinity and recency. A component called the AffectiveResonator queries episodic memory for similar past situations, computes a weighted valence from them, and injects a blended steering vector at inference time.

The model is, in a concrete sense, being steered by its own past emotional responses to similar situations. Seventeen years of engineering and a lot of reading got me here.

Architecture

Current context → Episodic memory → AffectiveResonator → 8D affect state → Evalatis → Layer injection

affective_system.py

8D Plutchik Emotional Field

Primary emotions as orthogonal dimensions. Secondary emotions emerge from adjacent primary pairs exceeding a co-activation threshold. Three innate valence sources: coherence, epistemic, relational.

affective_resonator.py

AffectiveResonator

Queries episodic memory for k-nearest past situations. Computes weighted valence from retrieved affective states. Injects blended steering vector at transformer layers 14-18.

evalatis.py

Evalatis

Evolutionary activation steering. Vectors crystallize from activation history, spawn children via blend-with-mutation, compete on affinity x staleness. Phase-modulated EMA rates across a 6-stage cognitive cycle.

SAELens / TransformerLens

SAE Substrate

163,840-feature sparse autoencoder transcoders trained on Qwen3-8B MLP layers. Provides an interpretable, sparse basis for all steering vectors in the Evalatis population.

Research

Self-Steering Cognitive AI: Evalatis, Active Inference, and 8D Affect Steering in Live LLM Systems

Novel contributions: Evalatis evolutionary steering, 8D Plutchik emotional field with wave-packet interference, Active Inference integration across a 6-phase cognitive cycle, SAE features as live perceptual substrate.

● Preprint in preparation

lilly-steering

Core affect steering and evolutionary activation architecture: affective_system.py, evalatis.py, affective_resonator.py — extracted from the live Lilly system with full documentation.

✓ Available on GitHub