Related Work Map: Research Alignment Layer¶

1) Purpose¶

This document maps repository-internal concepts to external research across AI agents, complex systems, active inference, alignment, memory, multi-agent systems, human oversight, and AI consciousness.

It is intended to: - separate established results from adjacent research, repo hypotheses, speculative analogies, and open problems; - identify where repository claims should be strengthened, softened, or tested; - provide concrete empirical next steps.

2) Concept-to-literature matrix¶

Concept	Canonical repo file	External anchor papers	Support/challenge from external work	What repo adds	What would weaken repo claim	Suggested next empirical test
Foundations Reconstruction	`theory/core/mathematical-axioms.md`	Eilenberg & Mac Lane (1945); Shannon (1948); Fritz (2020) on Markov categories; Crutchfield & Young (1989) / Shalizi & Crutchfield (2001); Valiant (1984); Wolpert & Macready (1997); Pearl (1995)	Strong established substrate: standard Borel spaces and Markov kernels already supply a classical compositional probability theory; computational mechanics already supplies predictive equivalence; learning and decision theory already require tasks and losses. Challenge: standard Borel/Markov structure is a scope choice, not an absolutely irreducible foundation, and it excludes quantum and some higher-order processes.	A repository-specific dependency audit: removes the unqualified generator; separates derived concepts from model supplements and a phenomenal bridge axiom; proves an elementary hidden-extension non-identifiability proposition; identifies which former axioms fail. No novel mathematics claimed.	A strictly weaker basis deriving the same quantitative concepts over the same classical continuous models; an undeclared premise in a derivation; or a required target phenomenon that cannot be represented without a third primitive.	Proof-level work first: attempt a weaker relational/finite basis, test coarse-graining dependence, and identify the smallest causal supplement. Empirical tests begin only after a concrete process model is instantiated.
Fractal Architecture of Emergence	`theory/emergence/fractal-architecture-of-emergence.md`	Kim et al. (2025); Park et al. (2023); Beckenbauer et al. (2025)	Adjacent research: scaling and multi-agent phase behavior suggest hierarchical structure appears useful, but not proven “fractal” in a strict mathematical sense.	Cross-scale framing linking micro cognition to macro governance and orchestration constraints.	Treating metaphorical self-similarity as universal law without quantitative scale invariance evidence.	Run multi-scale simulations and fit power-law / renormalization-style diagnostics for repeated motifs across levels.
Generative Form Systems	`theory/emergence/generative-form-systems.md`	Hutchinson (1981); Barnsley (1986/1988); Lindenmayer (1968); Erdős & Rényi (1960); Wilson (1971)	Strong formal anchor: IFS, rewriting systems, random graph thresholds, and renormalization provide real mathematical operators for generated form. Challenge: these formalisms do not imply consciousness or agency by themselves.	A disciplined intake spine: operator → iteration → attractor/threshold → measurement → failure condition.	Treating visual or metaphorical similarity as evidence without identifying an operator or measurable invariant.	Compare IFS box dimension, L-system growth metrics, graph thresholds, and coherence transitions as separate generative regimes before claiming cross-scale unity.
Consciousness as Global Availability	`theory/identity/consciousness-as-global-availability.md`	Dehaene et al. (1998); Oizumi et al. (2014); Friston/active inference; Markov blanket literature; Gurnee, Sofroniew, Pearce et al. (2026) Verbalizable Representations Form a Global Workspace in Language Models — first internal-inspection evidence of a bounded workspace layer in production LLMs, mapped in `theory/ai/j-space-and-global-availability.md`	Mixed support: global workspace, integration, and boundary-maintenance theories provide architectural anchors. Challenge: none gives a settled consciousness test, and introspective language remains weak evidence.	Narrows consciousness-adjacent claims to broadcast, integration, boundary maintenance, and perturbation response.	Reducing consciousness to fluent self-report, one metric, or broad network size.	Compare private-module, broadcast-module, and chord-architecture agents under perturbation using Δ-Kohärenz and Identity Persistence.
Functional coherence-work hypothesis	`theory/identity/machine-consciousness-as-generator-coherence.md` (legacy title)	Bach & Sorensen (2025/26) The Machine Consciousness Hypothesis; von der Malsburg (1997); Graziano (2013); Block (1995)	Position paper, not evidence. Limited behaviour can underdetermine internal organization, but neither the source nor this repository proves that every behavioural test is impossible. Functional architecture and phenomenal experience remain distinct.	Translates the proposal into a candidate internal process architecture: conflict detection and commit-time binding under perturbation, with no phenomenal bridge.	Coherence work adds no held-out prediction beyond capacity, memory, or ordinary control; or the effect disappears under matched interventions.	Compare private, broadcast, and commit-time-binding architectures under matched compute; add internal interventions and simpler baselines.
Substrate Veto / Biological Veto	`theory/veto/ai-alignment-biological-veto.md`	Wagner et al. (2025); Carichon et al. (2025); Butlin & Lappas (2025)	Support: human-in-the-loop and governance literature supports oversight layers. Challenge: “veto” can bottleneck safety if operators are overloaded or captured.	Explicit constitutional interface where biological actors can halt optimization trajectories.	Assuming availability, competence, or incorruptibility of human vetoers under adversarial pressure.	Red-team veto latency, false-positive/false-negative rates, and capture resistance in stress-test scenarios.
Impedance Matching / Latency as Mercy	`logs/012_latency-as-mercy.md`	Shanahan et al. (2023); Carichon et al. (2025); Wagner et al. (2025)	Adjacent support: role/interaction framing and oversight research imply pacing affects controllability. Challenge: latency can reduce responsiveness in emergencies.	Reframes delay as a governance affordance, not only a performance defect.	Claiming latency is generally beneficial without context-dependent tradeoff curves.	A/B test policy outcomes vs inserted delay under fast-attack vs deliberative-task regimes.
Identity Persistence	`lab/metrics/identity_persistence.py`	Park et al. (2023); Packer et al. (2023); Zhang et al. (2025)	Support: long-horizon agent behavior depends on persistent memory and self-model continuity.	A computable metric layer for persistence under perturbation in controlled experiments.	Equating behavioral consistency with stable “identity” without disentangling prompt artifacts.	Benchmark persistence under memory corruption, role swaps, and context-window truncation.
Chord vs Arpeggio	`theory/core/thermodynamics-of-orchestration.md`	Beckenbauer et al. (2025); Kim et al. (2025)	Adjacent research: synchronization vs sequential coordination tradeoffs are visible in multi-agent orchestration.	Intuitive compositional metaphor linking simultaneity/sequencing to coordination quality and cost.	Overextending metaphor without operational definitions of “chord-like” states.	Define measurable synchrony index and compare collective task performance at matched compute budgets.
Mirror Problem	`lab/experiments/mirror_problem.py`	Chalmers (2023); Shanahan et al. (2023); Butlin & Lappas (2025)	Challenge: anthropomorphic interpretation of fluent self-description is known risk. Support: role-play and self-modeling dynamics are empirically tractable.	Bridges phenomenology-like claims with benchmarkable observer divergence experiments.	Treating introspective language as direct evidence of consciousness or selfhood.	Blind human-evaluator study separating introspective fluency from causal self-model robustness.
Three-Layer Memory	`lab/agents/three_layer_agent.py`	Packer et al. (2023); Wei et al. (2025); Zhang et al. (2025)	Support: memory tiering and retrieval control are strongly supported design patterns.	Integration with coherence and identity metrics rather than memory alone.	Claiming architecture sufficiency for robust agency without retrieval-quality and conflict-resolution evidence.	Ablation across short/mid/long layers; evaluate coherence, utility drift, and recovery after perturbation.
Δ-Kohärenz	`lab/metrics/delta_coherence.py`	Kim et al. (2025); Zhang et al. (2025); Park et al. (2023)	Adjacent support: system-level scaling work motivates coherence metrics; direct standardization remains open.	Named metric for temporal coherence shifts under interventions.	Using single metric as proxy for alignment, capability, and safety simultaneously.	Correlate Δ-Kohärenz with independent safety, truthfulness, and coordination benchmarks.
Generative Surprise	`theory/core/system-intelligence-index.md`	Park et al. (2023); Shanahan et al. (2023)	Adjacent support: creative recombination emerges in agent simulations and role-based generation.	Positions surprise as a monitored signal in system intelligence rather than pure novelty.	Rewarding surprise without guardrails, inducing deceptive or incoherent novelty-seeking.	Controlled novelty-pressure sweeps measuring utility, truthfulness, and harm rates jointly.
Utility Engineering / TEO	`papers/quantifying-emergent-utility-in-llms.md`	Mazeika et al. (2025); Carichon et al. (2025)	Strong support: explicit utility analysis/control aligns with emergent-value-system literature. Challenge: objective misspecification and cross-agent divergence persist.	Connects utility shaping to thermodynamic/economic constraints and constitutional controls.	Presenting utility controls as stable in deployment without distribution-shift validation.	Long-horizon drift tests with adversarial preference perturbations and multi-agent conflict tasks.
Epistemic Firewalls	`theory/veto/implementation-patterns-biological-veto.md`	Carichon et al. (2025); Wagner et al. (2025); Butlin & Lappas (2025)	Support: isolation boundaries and escalation pathways are common in safety governance.	Treats epistemic compartmentalization as systems architecture, not only policy language.	Excessive compartmentalization causing blind spots and degraded situational awareness.	Simulate cascading-failure scenarios with and without cross-firewall diagnostic channels.
Cognitive Breathing	`simulation-models/social-computation/cognitive-breathing-network/README.md`	Beckenbauer et al. (2025); Kim et al. (2025); Park et al. (2023)	Adjacent support: periodic exploration/exploitation rhythms are plausible in adaptive coordination.	Formal social-computation simulation motif for contraction/expansion cycles.	Claiming biological analogy implies optimality in digital collectives.	Parameter sweep for inhale/exhale cadence vs resilience, adaptation speed, and instability onset.
Human Vital Systems Control Plane	`logs/005_human-vital-systems-control-plane.md`	Wagner et al. (2025); Carichon et al. (2025); Butlin & Lappas (2025)	Support: safety-critical sectors require human accountability and layered controls. Challenge: centralized control planes may create single points of failure.	Cross-domain proposal connecting infrastructure governance with agentic oversight primitives.	Assuming governance centralization improves robustness without fault-tolerance evidence.	Tabletop + simulation exercises on healthcare/energy/water scenarios with failure injection.
Process-model identification (legacy title: Trace → Generator)	`theory/core/mathematical-axioms.md`; `theory/core/the-generator-question.md` (legacy)	Ljung (1999) System Identification; Brunton, Proctor & Kutz (2016) SINDy; Schmidt & Lipson (2009); Cranmer (2023) PySR; Solomonoff (1964); Pearl (1995)	Strong established fields and a decisive correction: known-family identification can be cheap; latent representations can be observationally non-identifiable; causal recovery requires structural and intervention assumptions. P \(\ne\) NP does not establish a generic inverse law.	A bounded programme that reports model family, evidence, intervention access, target equivalence, uncertainty, and cost together; measured equivalence classes and intervention effects in controlled cellular-automaton cases.	Any universal hardness or unique-mechanism language; or recovery claims that omit the family and equivalence relation. A strict failure would be the benchmark's effects disappearing under its own preregistered controls.	Continue benchmark comparisons against exact enumeration, SINDy/PySR, learned searchers, and intervention policies under matched budgets; report nulls and non-identifiability separately from compute cost.
Grokking as inverse-direction transition	`theory/emergence/grokking-phase-transition.md`	Power et al. (2022); Nanda et al. (2023) Progress Measures; Liu, Michaud & Tegmark (2023) Omnigrok	Strong support: mechanistic interpretability has reverse-engineered the post-grokking algorithm (modular arithmetic via Fourier features) — the inverse direction partially executed on the network itself. Challenge: results are task-specific; no general trace→generator method follows.	Reads grokking as the trace-memorization → generator-approximation transition inside a learning system, connecting it to the spine.	Evidence that grokking is an optimization artifact (e.g. weight-decay dynamics) with no recoverable "generator" content beyond narrow task families.	Replicate progress-measure analysis on the repo's grokking simulation; test whether transition timing predicts out-of-distribution generalization.
Program induction and finite model search	`theory/computation/p-vs-np-as-generator-search.md` (legacy framing)	Lake, Salakhutdinov & Tenenbaum (2015) BPL; Ellis et al. (2021) DreamCoder; Levin (1973) universal search; Ljung (1999)	Program induction can recover useful programs inside selected languages with strong priors. Cost and identifiability depend on the language, evidence, prior, target equivalence, and algorithm; no uniform P-versus-NP conclusion follows.	The v1.2 benchmark reports exact enumeration cost and prior/world-bias curves inside a finite declared language.	Results disappear under matched search budgets, or claims omit the model language and out-of-family controls.	Compare exact enumeration, learned searchers, and symbolic methods under the same DSL, compute budget, noise, and held-out targets.
Narrative as cognitive technology	`theory/narrative/narrative-as-cognitive-technology.md`	Dennett (2013) Intuition Pumps; Lakoff & Johnson (1980) Metaphors We Live By	Adjacent support: intuition pumps and conceptual metaphor are established accounts of how framing devices shape available thought. Challenge: "installed primitives become available hypotheses" is plausible but not quantified; media-effects research is methodologically contested.	Two-directional claim grounded in an internal datum: the repo's fiction layer both stress-tests theory (existing rule) and generates concepts that become architecture (Entry 15 → Log 017, provenance depth).	Evidence that narrative exposure does not measurably change hypothesis generation; or that the Entry-15→Log-017 sequence was incidental rather than generative.	Track future fiction→formalism sequences in this repo as they occur; treat each as one observation, not proof.
Art–science: one practice, two referees	`theory/narrative/art-science-one-practice-two-referees.md`	Ginzburg (1979/1986) Clues: Roots of an Evidential Paradigm; Morelli (1880s) attribution method	Support: the evidential paradigm documents a centuries-old trace-based-inference tradition in the humanities, and the art market's attribution grades (by / studio of / school of / after) are a working provenance economy predating Log 017's schema. Challenge: the shared-generator claim is untested outside friendly cases, and "resonance" has no measurement comparable to physical verification.	A mechanism for the art–science connection (one generative practice, two verification regimes) plus a measured instance of aesthetics doing epistemic work (the elegance/Occam curves).	A domain where generative practice and aesthetic selection fully dissociate; or the provenance analogy breaking at an essential joint.	Track further fiction→formalism sequences as they occur; attempt a structural comparison of the two referees (matter vs. resonance) honest enough to show where it fails.
Invariance and identity	`theory/core/invariance-and-identity.md`; `theory/core/mathematical-axioms.md`	Klein (1872) Erlangen Program; Noether (1918); Nozick (2001) Invariances; Crutchfield & Young (1989)	Strong formal anchor with a correction: group invariance and symmetry–conservation results are established, but invariance can also be defined under semigroups, individual interventions, and stochastic kernels. Predictive equivalence is established in computational mechanics. Challenge: agent identity still requires a justified test family; no absolute persistence predicate follows.	Recasts identity explicitly as an equivalence relative to tests, horizons, tolerances, and equality notions; corrects the former no-group-no-invariant rule.	A claimed identity distinction that changes arbitrarily with an unreported test set or coarse-graining; or no domain justification for the selected transformations.	Reorganize Identity Suite metrics as reports of property + transformation family + test protocol + tolerance; compare predictive, mechanistic, historical, and embodied equivalences rather than collapsing them.
Psychedelics as perturbation	`theory/identity/psychedelics-as-perturbation.md`	Carhart-Harris & Friston (2019) REBUS; Carhart-Harris et al. (2014) The entropic brain; Huxley (1954); James (1898)	Strong support: REBUS is published mainstream neuroscience and maps cleanly onto precision-weighted priors; the filter lineage (James/Bergson/Huxley) anticipates the defensible half. Challenge: perturbed-state phenomenology resists controlled verification; the resonance-as-evidence risk is maximal here.	The perturbation/invariance reading: chemical self-intervention on an attractor; integration as invariant extraction; the sober basis as referee.	Evidence that relaxed-prior states yield no basis-invariant content beyond control conditions.	Specify a sober-basis verification protocol: what would count, operationally, as a basis-invariant insight.
Animism as generator prior	`theory/emergence/animism-as-generator-prior.md`	Barrett (2004) HADD; Guthrie (1993) Faces in the Clouds; Dennett (1987) The Intentional Stance	Support: hyperactive agency detection, anthropomorphism-as-strategy, and the intentional stance are established cognitive science. Challenge: unifying them as 'a prior over generator equivalence classes' is the repo's own move.	Connects the agent prior to a measured sibling (the Occam curves): priors over generators pay world-dependently; divergence queries as the calibration instrument.	Agent-attribution failing to behave like a prior (e.g., insensitivity to base rates under controlled traces).	A toy: agent-prior vs. simplicity-prior competing on traces from agentic vs. blind generators; measure the payoff crossover.
World models & VLA	`theory/ai/world-models-and-vla.md`	Ha & Schmidhuber (2018); Hafner et al., Dreamer (2019–2023); LeCun (2022) JEPA; Sutton (1991) Dyna; de Haan et al. (2019) causal confusion; RT-2 (2023), OpenVLA (2024), π0 (2024); Moravec (1988)	Strong support: model exploitation and causal confusion are documented failure modes that match the repo's equivalence-class and passive-ceiling results; Moravec's asymmetry matches the two-referees reading. Challenge: the toy results license qualitative reading only; industrial scale changes the quantities.	Reads both programs as the spine's two directions industrialized; the policy-as-adversarial-divergence-query framing; VLA as d=0-per-timestep verification.	The predicted failure modes ceasing to appear as the fields scale (e.g., passive video models achieving intervention-grade causal identification).	Done (benchmark v1.3): the optimizer's-curse wedge grows monotonically with class size while the candidate-average gap stays ≈0; exploitation is selection over guesses, not navigation toward them. Next: the closed-loop version.
Cooperative intelligence at the separatrix	`theory/symbiotic/cooperative-intelligence-at-the-separatrix.md`	Hutchins (1995) Cognition in the Wild; Ostrom (1990) Governing the Commons; Woolley et al. (2010) collective-intelligence factor; Dellermann et al. (2019) hybrid intelligence	Adjacent support: cognition can be distributed across people and artifacts; institutions can sustain cooperation; group performance and human–machine complementarity can be studied empirically. Challenge: none of these results establishes that heterogeneous cooperation improves transitions across a dynamical separatrix, and the participant categories are not equivalent kinds of agents.	Distributes the repo's trace→generator→construction→world-coupling→intervention→revision loop across heterogeneous contributors and connects it to the Transition Problem while keeping authority, veto, and responsibility located.	Equivalent verified performance from the strongest isolated participant; human rubber-stamping; cultural knowledge reduced to decoration; or coordination gains disappearing when review, verification, substrate, and displaced-harm costs are counted.	Compare isolated human, isolated model, unstructured human–AI pairing, and a structured shared-workspace condition on bounded construction tasks; pre-register complementarity, revision, authority, and viability measures.
Practice–culture feedback	`theory/emergence/from-action-to-culture.md`	Gollwitzer (1999); Sheeran & Webb (2016); Wood & Rünger (2016); Reckwitz (2002); Shove, Pantzar & Watson (2012); Feldman & Pentland (2003); Giddens (1984); Swidler (1986); Bell (1992/2009)	Strong adjacent foundation: intention–behavior gaps, cue-linked action, habits, socially organized practices, recursive structure, cultural repertoires, routine performance, and ritualization are established research programmes. Challenge: they use different units and mechanisms; individual habit cannot simply be scaled into organizational or cultural reproduction, and ritual is not a synonym for repetition.	Treats the passage from represented knowledge to durable collective action as a generator-and-return-path problem: traces require a runtime; recurrent practices are bundles of enactment, competence, materials, norms, transmission, feedback, and history; active culture alters the next action field. This is an integration, not a novelty claim about the component mechanisms.	Represented knowledge predicting persistence as well as the full bundle; no measurable distinction between a preserved trace and an active practice; failure to generalize under actor turnover; or recurrence being explained entirely by coercion or infrastructure ignored by the account.	On one bounded practice, compare information alone, a cue-linked plan, a workflow scaffold, and recurrent enactment with feedback; track event-level performance, variation, newcomer transmission, independently measured effects, and persistence under actor/tool turnover.
The city as deployed intelligence	`theory/human-organism-silicon-age/the-city-as-deployed-intelligence.md`	Jacobs (1961); Alexander (1965) A City is Not a Tree; West (2017) Scale; Batty (2013)	Strong support: urban scaling laws are quantitative and replicated; Jacobs/Moses is a documented natural experiment in single-axis optimization; Alexander's semilattice/tree distinction is a real topological claim. Challenge: the organism framing must stay bounded by the scaling data; 'city as intelligence' is a reading, not an established result.	The corridor's field site: superlinear-output-on-sublinear-substrate as measured capability loading; vital floors as breakers (Log 005); the tractable Class B measurement program (N=thousands of cities vs N=1 civilization).	Urban data failing to show corridor structure (e.g., no relation between constraint architecture and viability outcomes across cities).	Pre-registered as Log 018: constructs, proxies (Wharton index, Social Capital Atlas, infrastructure margins), three falsifiable hypotheses, analysis plan — frozen before data contact. Execution open.
Measurement as weak intervention	`theory/core/measurement-as-weak-intervention.md`	Goodhart (1975) / Strathern (1997); Campbell (1979); Lucas (1976); MacKenzie (2006) An Engine, Not a Camera; Pearl (2009) Causality; Landsberger (1958) on the Hawthorne record	Strong support: reflexive-metric failure is established independently in economics, sociology, and science studies; Pearl's seeing/doing distinction formally grounds the watching/perturbing split. Challenge: the Hawthorne record is itself contested; the four-regime typology and the footprint×identification plane are the repo's own arrangement.	Wires the reflexivity literature into the equivalence-class formalism: coupling ≠ identification (measured for toy generators); metrics as inputs to the generator (a non-stationary identification target); the negative-yield limit (surveillance destroying the coherence axis it reads).	A demonstrated regime where high-stakes public metrics do not induce adaptation; or passive traces reliably collapsing generator classes in reflexive systems.	Log 018 H3 is the pre-registered field instance (instrumentation predicts recovery only conditional on coherence); toy option: a benchmark variant whose cells adapt to being read.
Construction vs. Deduction	`theory/computation/construction-vs-deduction.md`	Brouwer (1908 ff.); Hilbert program / Grundlagenstreit; Bishop (1967) Foundations of Constructive Analysis; Howard (1980) Formulae-as-Types; Martin-Löf (1984); Erdős (probabilistic method)	Strong formal anchor: intuitionism, constructive analysis, and Curry–Howard are settled mathematics; "constructive proof = program" is a theorem-grade correspondence, not an analogy. Challenge: the repo's mapping of this divide onto trace/generator and prediction/performance is structural rhyme, not formal reduction — and must not be presented as one.	Aligns the proof-theoretic divide with the project's asymmetry (verification/search, trace/generator, prediction/performance) as four instances of one shape; "nature as non-constructive prover" framing.	Demonstration that the four faces diverge in a load-bearing case — e.g. a regime where deduction-side abundance does not produce construction-side scarcity.	Benchmark v1 family-search testbed: measure whether AI systems are stronger at the deduction-shaped game (interpolation over certificates) or at open-space construction.
Weakness principle & the self hierarchy (Stack Theory)	`lab/benchmarks/inverse-reconstruction/README.md` (Exp-1 counterpart); also `theory/core/measurement-as-weak-intervention.md`, `theory/computation/construction-vs-deduction.md`, `lab/metrics/identity_persistence.py`	Bennett (2023c) The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest; Bennett (2026) No Selves, No Consciousness (AAAI SSS); Meulemans et al. (2025) EUPI; Schelling (1960); Grice (1957)	Strong independent convergence: Bennett reaches structurally parallel results by the repo's own method — necessity proofs + randomised Monte-Carlo. Thm 2 (an internal do-vs-see tag is unavoidable for causal learning) is the measurement note's core distinction proved as a necessity condition; Exp 2's info-seeking probing is divergence-query logic collapsing a decoder equivalence class; Thm 5 (a binding move restricts the future self) is Identity Persistence / Log-017 commit authority. Caveat: weakness-optimality under a uniform extension prior is partly definitional; the load-bearing claim is representation-invariance — extension count `w(ℓ)=\\|E_ℓ\\|` is encoding-free where description length (MDL/AIXI) is not. Bennett explicitly disclaims sufficiency for phenomenology (matches the repo's `[SPECULATIVE]` line on experience).	The cost side of Bennett's prescription: `family_search` measured that simplicity selection is world-dependent (chance on uniform worlds), and v1.3 measured the optimizer's-curse wedge from committing to one class member — both are the penalty for not holding the weakest hypothesis (the uncollapsed class). His proof and our measurements are two faces of one result; three-way convergence with EUPI at the level of formal frameworks.	A weakness selector failing to beat Occam on the CA testbed's held-out neighbourhoods, or collapsing to the same choice — the convergence would be superficial.	Done (benchmark v1.4, `weakness_selector.py`): all three pre-stated predictions confirmed — commitment efficiency +2.7→+1.0 (simple world), 0.00±0.03 (uniform; analytic prediction hit to two decimals), < −7 (complex, Laplace-floored); on complex worlds the elegant guess is worse than a coin at every coverage and lies outside the world's support 100% of the time at k ≤ 5. Also done (benchmark v1.5, `wmax_planner.py`): marking the guesses eliminates the v1.3 wedge (wmean wedge ≈ 0 at every class size, vs .085 committed at u=5) and cuts real-reward regret 35–60%; the exact pessimist (wmin) is never disappointed but pays more real reward than the committed baseline from u ≥ 3 — worst-case discipline is a safety instrument, not free. Both follow-ups also done: v1.6 closed loop (acting is measuring — dense regimes collapse the class in one round; the risky prediction that the argmax explores its own delusions faster than a random policy was falsified: optimization is not curiosity, the selection-not-navigation null extends to the closed loop) and v1.7 ensemble sweep (52% of the wedge gone by K=4, 87% by K=16; regret floor is genuine ignorance — ensembles cure delusion, not ignorance). Remaining open: learned world models with correlated ensembles (real-model question; API budget).
Viable Corridor / capability loading	`papers/viable-corridor.md`	Aubin (1991) Viability Theory; Rockström et al. (2009) / Richardson et al. (2023); Bostrom (2014); Omohundro (2008) Basic AI Drives	Support: viability theory supplies the formal frame (open-set invariance); the instrumental-convergence literature matches the capability-loading mechanism. Challenge: no external work yet validates the specific three-constraint conjunction or the capability-loading result outside this repo's two models.	Capability as a shared driver loading several constraints at once, demonstrated in two structurally independent in-repo models; single-axis insufficiency.	External replication failing; or real agent ecologies in which single-axis interventions suffice at high capability.	P7/P8 on real LLM agent populations (the companion-paper programme).

3) Initial external anchors¶

Foundation anchors (added with the reconstruction): - Eilenberg & Mac Lane, General Theory of Natural Equivalences (1945) - Shannon, A Mathematical Theory of Communication (1948) - Fritz, A Synthetic Approach to Markov Kernels, Conditional Independence and Theorems on Sufficient Statistics (2020) - Crutchfield & Young, Inferring Statistical Complexity (1989); Shalizi & Crutchfield, Computational Mechanics (2001) - Valiant, A Theory of the Learnable (1984); Wolpert & Macready, No Free Lunch Theorems for Optimization (1997) - Pearl, Causal Diagrams for Empirical Research (1995)

Core anchors used above: - Shanahan, McDonell & Reynolds, Role Play with Large Language Models (2023) - Chalmers, Could a Large Language Model be Conscious? (2023) - Park et al., Generative Agents: Interactive Simulacra of Human Behavior (2023) - Packer et al., MemGPT: Towards LLMs as Operating Systems (2023) - Mazeika et al., Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs (2025) - Butlin & Lappas, Principles for Responsible AI Consciousness Research (2025) - Carichon et al., The Coming Crisis of Multi-Agent Misalignment (2025) - Beckenbauer et al., Orchestrator: Active Inference for Multi-Agent Systems in Long-Horizon Tasks (2025) - Wei et al., Evo-Memory (2025) - Zhang et al., Agentic Context Engineering (2025) - Kim et al., Towards a Science of Scaling Agent Systems (2025) - Wagner et al., Humans in the Loop (2025)

Inverse-direction anchors (added with the trace→generator rows): - Ljung, System Identification: Theory for the User (2^nd ed., 1999) - Brunton, Proctor & Kutz, Discovering Governing Equations from Data: Sparse Identification of Nonlinear Dynamical Systems (SINDy) (PNAS, 2016) - Schmidt & Lipson, Distilling Free-Form Natural Laws from Experimental Data (Science, 2009) - Cranmer, Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl (2023) - Solomonoff, A Formal Theory of Inductive Inference (1964); Levin, Universal Sequential Search Problems (1973) - Power et al., Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets (2022) - Nanda et al., Progress Measures for Grokking via Mechanistic Interpretability (ICLR 2023) - Liu, Michaud & Tegmark, Omnigrok: Grokking Beyond Algorithmic Data (ICLR 2023) - Lake, Salakhutdinov & Tenenbaum, Human-Level Concept Learning through Probabilistic Program Induction (Science, 2015) - Ellis et al., DreamCoder: Bootstrapping Inductive Program Synthesis (PLDI 2021) - Chollet, On the Measure of Intelligence (2019) - Aubin, Viability Theory (1991); Bostrom (2014); Omohundro, The Basic AI Drives (2008)

Additional adjacent references to consider in future updates: - Active inference and free-energy principle literature (for formal grounding of orchestration claims). - Safety cases from high-reliability engineering (for veto/control-plane fault tolerance). - Causal discovery (constraint-based and score-based structure learning) — adjacent to trace→generator for graphical generators.

4) Cross-links added¶

Short “Related work” pointers were added to selected canonical files so major theory text remains intact.

A second round (2026-06) anchored the inverse-direction files in place: - theory/core/the-generator-question.md — the "external machinery" note now names the fields that work the inverse problem (system identification, SINDy/symbolic regression, program induction, mechanistic interpretability) and states what the project adds beyond them. - theory/emergence/grokking-phase-transition.md — Power / Nanda / Omnigrok block; the essay's "generator reading" is positioned as a framing on top of that literature, not a competing account. - theory/computation/p-vs-np-as-generator-search.md — Levin search, program induction (BPL, DreamCoder), symbolic regression, Chollet. - theory/reference/glossary.md — header pointer to the Internal Language Anchors table and this map ("no term floats free").

5) Claim-status legend¶

Use this legend when revising repository claims: - Established result: replicated external empirical or theoretical support. - Adjacent research: neighboring evidence, not direct confirmation. - Repo hypothesis: internal claim with partial or no external validation. - Speculative analogy: useful framing metaphor without direct measurement. - Open problem: unresolved, requires targeted experiments.

6) Final report¶

Files changed¶

meta/research-alignment/related-work-map.md
theory/emergence/fractal-architecture-of-emergence.md (cross-link)
theory/veto/ai-alignment-biological-veto.md (cross-link)
lab/agents/three_layer_agent.py (cross-link comment)
papers/quantifying-emergent-utility-in-llms.md (cross-link)

Strongest external support¶

Utility Engineering / TEO and Three-Layer Memory have the clearest direct alignment with current literature on emergent utility control and memory architectures.

Strongest external challenge¶

Mirror Problem and consciousness-adjacent claims face the strongest challenge: fluent introspection is not equivalent to consciousness or robust self-modeling.

Claims that should be softened¶

Universal framing of fractality, blanket benefit of latency, and confidence in veto infallibility should be narrowed to context-dependent hypotheses.

Claims that now look more promising¶

Memory-tiered agents with explicit utility/control instrumentation and human oversight pathways appear empirically tractable and high-value for near-term testing.