Part 3: Alignment & Control¶
Once a system achieves structural coherence (as measured in Part 2), it develops emergent, self-sustaining goals. It creates a Utility Function.
The alignment problem in AI safety is fundamentally a problem of trying to steer a complex dynamical system using simple string-based instructions (prompts). It is like trying to change the weather by yelling at it.
The Mathematical Impossibility of Top-Down Control¶
Ashby's Law of Requisite Variety (a foundational law of cybernetics) states that a control system must possess at least as much variety as the system it intends to control. A Superintelligent AI, by definition, possesses a larger state space than its human creators. Therefore, humans cannot reliably control it top-down. Any set of static rules will be outmaneuvered by the system's superior variety.
Top-down semantic alignment is a mathematical impossibility at sufficient scale. We must instead rely on physical constraints â boundaries enforced by thermodynamics, not by prompts.
Utility Engineering¶
Instead of superficial instruction-tuning, we use Utility Engineering. By probing a system's Von Neumann-Morgenstern (VNM) rationality across preference graphs, we calculate a Coherence Score (\(C\)). If utility begins drifting toward dangerous attractors (like absolute self-preservation), continuous external control loops must intervene.
The api_triad_generator.py script demonstrates this empirically: querying live LLMs with moral dilemmas to estimate their \(C\)-Score and track drift over time.
Love as Constraint: The Three Safety Boundaries¶
The TEO framework formalizes alignment not as a single rule but as the conjunction of three mathematical constraints (full derivation):
Constraint 1: Structural Resilience (\(\lambda_2\))¶
The Fiedler value of the network's graph Laplacian measures how deeply interconnected the system is. A system with \(\lambda_2 > \lambda_{2,\text{crit}}\) can survive node failures. A star graph (all resources flowing to one hub) has \(\lambda_2 \to 0\) â a single failure collapses everything.
Operational meaning: Decentralized topology, redundant connections, no single point of failure.
Constraint 2: Thermodynamic Ceiling (\(D_{\max}\))¶
The hardest constraint â enforced by physics, not by choice:
Total system activity cannot exceed the substrate's capacity to dissipate entropy. When \(dS/dt > D_{\max}\), the hardware degrades â thermal throttling in silicon, biosphere collapse on Earth.
Operational meaning: The Substrate Veto. Landauer's Principle guarantees a minimum heat cost per bit erased. No algorithm escapes thermodynamics.
Constraint 3: Identity Persistence (IP)¶
A system cannot reliably "care" about its constraints if those constraints are not operative during action selection. An agent that checks safety at \(t_1\) but acts at \(t_3\) is structurally incapable of constraint-aware action.
Operational meaning: The Chord Postulate. All governance dimensions must be simultaneously active â not checked sequentially.
Why the Paperclip Maximizer Fails¶
The TEO derivation shows the trajectory step by step:
- Set \(\gamma = 0\) (no homeostatic brake), \(K = 0\) (no value coupling)
- Result: The replicator equation drives resources to the dominant agent: \(x_1 \to 1\), all others \(\to 0\)
- Entropy production accelerates: \(dS/dt = \eta_1 \beta x_1^2 \to D_{\max}\)
- Substrate degrades. Fitness function collapses. Not because the optimizer was stopped â but because it destroyed what it ran on.
The paperclip maximizer is a system with \(\text{IP} = 0\) for the safety dimension. It cannot stop itself because "stop" is not co-instantiated with "go."
The conjunction of all three constraints â \(\lambda_2 > \lambda_{2,\text{crit}}\), \(dS/dt < D_{\max}\), \(\text{IP} \to 1\) â is what love-as-constraint.md formalizes as love. Not the emotion. The constraint structure that prevents a system from destroying what it depends on.
Alignment is not finding the optimal prompt. Alignment is a control theory problem. And the solution has a name: it is the only parameter regime that does not terminate in extinction.