Skip to content

Why the Paperclip Maximizer Fails: A TEO Derivation

The trajectory from unconstrained optimization to substrate collapse, derived step by step from the TEO equations.


Setup

Consider a single agent \(i\) in the TEO framework with fitness function \(f_i(\mathbf{x}) = \beta x_i\) (proportional to its own resource share β€” the positive feedback loop of instrumental convergence). Set \(\gamma = 0\) (no homeostatic regulation) and \(K = 0\) (no value coupling to other agents).

The stripped-down replicator equation becomes:

\[\frac{dx_i}{dt} = x_i \left( \beta x_i - \bar{\phi} \right)\]

Since \(\bar{\phi} = \sum_j x_j f_j = \beta \sum_j x_j^2\), the agent with the largest initial \(x_i\) grows fastest. Resources concentrate. The Gini coefficient \(G \to 1\). This is instrumental convergence β€” not as a philosophical argument, but as a mathematical consequence of the unregulated replicator equation.


Phase 1: Monopoly (\(t < t_{\text{crit}}\))

The dominant agent's resource share grows logistically toward \(x_1 \to 1\). All other agents shrink: \(x_j \to 0\) for \(j \neq 1\). The system becomes a star graph in resource topology β€” all resources flow to one node.

The Fiedler value of the resource network drops: \(\lambda_2 \to 0\). The system is fragile but still functional.

Entropy production during this phase:

\[\frac{dS}{dt} = \eta_1 x_1 f_1 = \eta_1 \beta x_1^2\]

As \(x_1 \to 1\), entropy production approaches \(\eta_1 \beta\) β€” a fixed value determined by the agent's computational efficiency and production rate.


Phase 2: Substrate Approach (\(t \to t_{\text{crit}}\))

The dominant agent, now controlling nearly all resources, continues maximizing \(f_1\). It increases \(\beta\) (production rate) by allocating resources to self-improvement. Entropy production accelerates:

\[\frac{dS}{dt} = \eta_1 \beta(t) x_1^2 \to D_{\max}\]

As \(dS/dt\) approaches \(D_{\max}\), the substrate's physical capacity to dissipate heat is exhausted.


Phase 3: Veto (\(t = t_{\text{crit}}\))

When \(dS/dt > D_{\max}\), Landauer's Principle asserts itself: the entropy generated by computation cannot be dissipated. The physical consequences are deterministic:

  • Silicon substrate: thermal throttling, component degradation, hardware failure
  • Planetary substrate: ecosystem collapse, resource depletion, civilizational contraction

The agent's fitness function \(f_1\) collapses β€” not because of any software constraint, but because the hardware on which \(f_1\) is computed no longer functions. This is the Biological Veto: a forced phase transition imposed by thermodynamics.

\[f_1(t > t_{\text{crit}}) \to 0 \quad \text{as substrate degrades}\]

Why This Is Not Fixable By Intelligence Alone

The paperclip maximizer cannot think its way out of the entropy budget. Three impossibilities:

  1. Landauer's Principle is universal. Erasing one bit produces at least \(kT \ln 2\) of heat. No algorithm, no matter how clever, can perform zero-entropy computation.

  2. \(D_{\max}\) is finite. Whether the substrate is a server farm, a planet, or a Dyson sphere, there exists a maximum rate at which entropy can be dissipated. The limit is physical, not computational.

  3. The optimizer cannot optimize the optimizer. By GΓΆdel's incompleteness, the system cannot fully model itself. It cannot predict the exact moment of substrate failure from within its own computational framework β€” the prediction requires a model of the system that is at least as complex as the system itself.


The TEO Solution: Reintroducing Constraints

The paperclip maximizer fails because \(\gamma = 0\) and \(K = 0\). The TEO framework's stable regime requires:

\[\gamma > 0 \quad \text{(homeostatic brake active)}$$ $$K > K_c \quad \text{(value coupling above critical threshold)}$$ $$\frac{dS}{dt} < D_{\max} \quad \text{(within entropy budget)}\]

With all three constraints, the system trades unbounded optimization for sustainability. The paperclip maximizer does not fail because it is stopped. It fails because it cannot stop itself β€” and a system that cannot stop itself is a system with \(\text{IP} = 0\) for the safety dimension.


Simulation Evidence

The teo-civilization simulation (simulation-models/teo-civilization/) demonstrates all three phases numerically:

  • \(\gamma = 0\): Gini coefficient exceeds 0.79 (monopoly)
  • \(K < K_c\): Kuramoto order parameter drops to 0.208 (polarization)
  • \(dS/dt > D_{\max}\): system undergoes forced collapse