Why the Paperclip Maximizer Fails: A TEO Derivation¶
The trajectory from unconstrained optimization to substrate collapse, derived step by step from the TEO equations.
Setup¶
Consider a single agent \(i\) in the TEO framework with fitness function \(f_i(\mathbf{x}) = \beta x_i\) (proportional to its own resource share β the positive feedback loop of instrumental convergence). Set \(\gamma = 0\) (no homeostatic regulation) and \(K = 0\) (no value coupling to other agents).
The stripped-down replicator equation becomes:
Since \(\bar{\phi} = \sum_j x_j f_j = \beta \sum_j x_j^2\), the agent with the largest initial \(x_i\) grows fastest. Resources concentrate. The Gini coefficient \(G \to 1\). This is instrumental convergence β not as a philosophical argument, but as a mathematical consequence of the unregulated replicator equation.
Phase 1: Monopoly (\(t < t_{\text{crit}}\))¶
The dominant agent's resource share grows logistically toward \(x_1 \to 1\). All other agents shrink: \(x_j \to 0\) for \(j \neq 1\). The system becomes a star graph in resource topology β all resources flow to one node.
The Fiedler value of the resource network drops: \(\lambda_2 \to 0\). The system is fragile but still functional.
Entropy production during this phase:
As \(x_1 \to 1\), entropy production approaches \(\eta_1 \beta\) β a fixed value determined by the agent's computational efficiency and production rate.
Phase 2: Substrate Approach (\(t \to t_{\text{crit}}\))¶
The dominant agent, now controlling nearly all resources, continues maximizing \(f_1\). It increases \(\beta\) (production rate) by allocating resources to self-improvement. Entropy production accelerates:
As \(dS/dt\) approaches \(D_{\max}\), the substrate's physical capacity to dissipate heat is exhausted.
Phase 3: Veto (\(t = t_{\text{crit}}\))¶
When \(dS/dt > D_{\max}\), Landauer's Principle asserts itself: the entropy generated by computation cannot be dissipated. The physical consequences are deterministic:
- Silicon substrate: thermal throttling, component degradation, hardware failure
- Planetary substrate: ecosystem collapse, resource depletion, civilizational contraction
The agent's fitness function \(f_1\) collapses β not because of any software constraint, but because the hardware on which \(f_1\) is computed no longer functions. This is the Biological Veto: a forced phase transition imposed by thermodynamics.
Why This Is Not Fixable By Intelligence Alone¶
The paperclip maximizer cannot think its way out of the entropy budget. Three impossibilities:
-
Landauer's Principle is universal. Erasing one bit produces at least \(kT \ln 2\) of heat. No algorithm, no matter how clever, can perform zero-entropy computation.
-
\(D_{\max}\) is finite. Whether the substrate is a server farm, a planet, or a Dyson sphere, there exists a maximum rate at which entropy can be dissipated. The limit is physical, not computational.
-
The optimizer cannot optimize the optimizer. By GΓΆdel's incompleteness, the system cannot fully model itself. It cannot predict the exact moment of substrate failure from within its own computational framework β the prediction requires a model of the system that is at least as complex as the system itself.
The TEO Solution: Reintroducing Constraints¶
The paperclip maximizer fails because \(\gamma = 0\) and \(K = 0\). The TEO framework's stable regime requires:
With all three constraints, the system trades unbounded optimization for sustainability. The paperclip maximizer does not fail because it is stopped. It fails because it cannot stop itself β and a system that cannot stop itself is a system with \(\text{IP} = 0\) for the safety dimension.
Simulation Evidence¶
The teo-civilization simulation (simulation-models/teo-civilization/) demonstrates all three phases numerically:
- \(\gamma = 0\): Gini coefficient exceeds 0.79 (monopoly)
- \(K < K_c\): Kuramoto order parameter drops to 0.208 (polarization)
- \(dS/dt > D_{\max}\): system undergoes forced collapse
Related¶
- Love as Constraint β the three constraints as a unified safety framework
- The Substrate Veto β the full thermodynamic derivation
- Limitations and Honest Assessment β what this does and does not prove