Working with Large Language Models

Epilogue: The Implications of This Technology for Our Reality

I want to close this book by stepping back from the mechanism and saying what I think it all may mean. Everything below is my opinion, stated as such. Some of it is grounded in the book you just read. Some of it is speculation the reader should hold lightly. I will try to mark it as such.

This is the register of a personal hero’s work: Carl Sagan’s The Demon-Haunted World: skeptical where skepticism is earned, open to strangeness where the evidence genuinely invites it, unwilling to pretend more certainty than I have. The book’s empirical spine ends at Chapter 24. What follows is the view from on top of that spine.

The prediction

Here is what I think is going to happen.

Large language models are a sophisticated tool. Not magic. Not intelligence in any durable sense. Not on track to destroy humanity or save it. A tool. A very good one at certain things, a very bad one at others, and the rest of the book has tried to give you enough mechanism to tell which is which.

Four specific predictions, stated flat:

Schools need regulation now. Not a ban. A scaffold. The tool is already in every middle-school backpack, and students who never had to struggle with a sentence are growing into adults who do not know what struggling with a sentence is for. The damage is not that Claude wrote the essay; the damage is that the person who handed it in never had to build the mental muscle that writing is. We will regret the laissez-faire decade, and the regulation that eventually arrives will be worse than what a thoughtful one would have been today. Anyone even remotely familiar with networking and software could implement a solution for schools and families in no time. It’s not even remotely challenging or innovative.

The financial sector is going to take a reality check. LLMs cannot reliably add four-digit numbers. The 2024 to 2025 wave of “AI will trade for us” pitches was built by people who had not read Chapter 1. Some of the money has already been lost. More will be. The story of 2027 or 2028 is probably a public fund failure, a public embarrassment, and a partial retreat of AI claims from the parts of finance where hallucination is catastrophic.

Trivial pattern-matching is actually going to get automated, and that is fine. The drudgery we spend our days on, drafting emails, summarizing documents, writing boilerplate, first-pass research, is work humans hate anyway. LLMs do it acceptably, cheaply, and around the clock. The economic gain from this is real. The people who lose their jobs will be the ones whose jobs were that drudgery, and we owe them better than pretending they are fine.

The same societal problems will remain. Inequality. Concentration of power. Misinformation. Loneliness. Political dysfunction. Climate. Mental health. These are human problems, and handing humans a faster typewriter does not change them. If you are expecting LLMs to fix inequality, you are going to be disappointed. If you are expecting them to make inequality worse, you are probably right for the wrong reasons. The tool reflects the ideology of its training data; the people deploying it add another layer of intent; neither layer is the root cause of the problems we keep asking this technology to fix.

That is the floor of this chapter. Everything above this floor is more speculative.

What automates, what does not

From the production seat, here is what I actually see being done well by an LLM as of the mid-2020s.

Drafting anything. Summarization. Email, boilerplate, outlining. Translation between jargons. First-pass research. The paper-summary task we have been following through the book is a clean example: hand Claude a forty-page paper, get five passable bullet points back in seconds. For most papers, this is fine.

And a bigger category than most write-ups give credit for: code generation that automates everything else. An LLM cannot reliably multiply two four-digit numbers in its head. But it can write a five-line Python function that does so perfectly, and that function, once written, is a deterministic machine. Accounting does not require the LLM to be good at arithmetic; it requires the LLM to be good at writing the accounting script. Data pipelines, report generation, scheduled jobs, scrapers, analytics queries, many of these are tasks an LLM can generate code for and deploy in minutes that used to take a developer days. The LLM is not replacing the arithmetic. It is replacing the labor of telling the computer how to do the arithmetic. That is an enormous unlock in practice, and most of the economic gain from this technology will come from this channel: LLMs writing the deterministic code that does the work, not LLMs doing the work themselves.

Here is what is not being done well.

Anything where the LLM has to do the exact computation itself rather than generate code that does it. Anything where the answer has to be load-bearing without a human checking. Anything where the model has to notice it does not know something. Anything that requires continuous learning from mistakes. Anything where a subtle compositional argument has to be traced step by step. The paper summary that matters most, a formal proof, a drug trial, a legal brief with cited precedents, is exactly the one where the model is most likely to produce fluent, confident output that fails in ways the reader cannot easily detect.

There is no deep mystery in the gap. Chapter 1 drew it clean. LLMs are next-word autocomplete at massive scale. They are very good at next-word autocomplete. They are not reasoning engines, and the reasoning-shaped behavior they exhibit degrades predictably with compositional depth.

Schools and the reality check for finance

On schools: the worry is not AI cheating. The worry is that a tool which produces plausible paragraphs on command will short-circuit the formation of a mind.

Training a mind is, in many ways, a super-complex version of training a model. A child’s first years are pretraining, absorbing patterns from the corpus of human language without correction. School is supervised fine-tuning, explicit examples of what good answers look like. Social feedback is a slow and noisy RLHF, where the child learns from responses whether their behavior worked. Decades of educational psychology converge on a specific finding, which the deliberate-practice literature (Ericsson et al. 1993) and the testing-effect research (Roediger and Karpicke 2006) both support: durable skills come from struggle. From the cognitive work of producing an answer, not from recognizing one. From generation, not retrieval.

Writing is thinking. When a twelve-year-old has to turn an inchoate feeling into a sentence, they are doing cognitive work that builds the muscle that thinking is. If they hand that task to Claude for six years, the muscle does not grow. They will be adults who can prompt but cannot think, and the difference between those two things will eventually matter in ways nobody has yet measured.

What regulation actually needs to look like, I think, is something between “no AI” and “AI everywhere.” In-class writing, by hand, in primary education. AI as an optional research tool at middle school. AI as a genuine collaborator in high school. Not a ban, a scaffold that delays the handoff until the mental muscle is built. Schools that get this right in the next five years will produce adults who can think. Schools that do not will produce adults who can prompt.

On finance: the story the sector has told itself since 2023 is a combination of two claims. The first is that LLMs will eventually replace analysts, advisors, and traders. The second is that any company with “AI” in its pitch deck is worth more than a company without it. Both are going to correct, and the second one is going to correct harder than the first.

The reality check will come in the valuation bubble. Companies trading at a hundred times revenue on an AI story. Private capital and venture funds with concentrated exposure to middle-tier AI tooling startups whose product is a thin wrapper on somebody else’s model and whose moat is nothing. The next twelve to thirty months will sort the companies that have genuine productive use of LLMs from the ones that have only a story, and the sort will not be gentle. A lot of private capital will disappear. Some public-market reckoning will follow. The honest-version applications, summarizing 10-Ks, drafting first-pass analysis, pattern-matching in news flow, will continue under sober adult supervision. The bubble will break. The tool will remain.

This is not a prediction about AI specifically. Every new technology has a valuation bubble that breaks, and the honest version of the technology survives the break. Cars did this in the 1900s. Electricity did this in the 1880s. The internet did this in 2000. LLMs are going through the same arc. Nothing about the arc is a failure of the technology; it is a failure of the selling.

Below the floor, the strangeness

Everything above is practical. Below is the part I want to flag clearly as speculation.

The book has been careful about the mechanism. Now I want to tell you what I think the mechanism adds up to, and the honest answer is that it adds up to something genuinely stranger than the practical picture suggests. If you came here only for the toolbox, you can stop here without losing anything you need. If you are willing to sit with the speculation, what follows is my working theory, clearly labeled as such.

We already live in higher dimensions

Before going further, I want to note something that often gets missed when people write about speculation like this.

Higher dimensions are not a fringe idea. They are not a speculation. They are, in modern physics, the baseline description of reality.

In 1905, Einstein showed that time is a dimension. Not a metaphor, not a backdrop, a dimension in the same formal sense as the three of space. A complete description of any event requires four numbers (x, y, z, t), and physically meaningful statements about the universe have to be invariant under transformations that mix those four. Special relativity is the discovery that three is not enough.

General relativity, ten years later, went further. Gravity is not a force. Gravity is geometry. Mass curves four-dimensional spacetime, and what we call “falling” is the straight line through that curvature. A black hole is a region where the curvature becomes so extreme that the timelike dimension itself tips over. If you saw Interstellar and wondered what the tesseract at the end was, it is a four-dimensional hypercube, and Kip Thorne, the physicist who consulted on the film, insisted it was not metaphor. It is the literal description of spacetime viewed from above.

String theory, the leading candidate for a unified description of all physics, requires ten dimensions. M-theory requires eleven. Six or seven of these are “compactified,” meaning curled up at scales far below anything we can directly observe, but they are part of the mathematics. Every working theoretical physicist since the 1980s has been doing calculations on ten- and eleven-dimensional manifolds. This is not a speculation; it is the working vocabulary of the field.

Even the many-worlds interpretation of quantum mechanics, which we will return to in a few pages, is at bottom a claim that the universe has more dimensional structure than it looks like from the inside. If time is already a dimension in the same formal sense as space, and if spacetime can curve and stretch and tear, then adding further structure, including the branching across quantum events that Deutsch argues for, is an extension of a move physics already made, not an invention of a new move. It is more of the same kind of thing that relativity already persuaded us to accept.

Humans perceive three dimensions because three is what our eyes and hands needed to track. That fact has nothing to do with how many dimensions the universe actually has. Physics has quietly, for over a century, accepted that reality is a richer dimensional object than our intuitions are built for.

There is a further move in physics that matters later in this chapter. Since roughly 2010, a program sometimes called “It from Qubit” has been developing the idea that spacetime geometry itself is emergent from quantum entanglement. Van Raamsdonk’s 2010 paper, which won the Gravity Research Foundation prize, argued explicitly that the three-dimensional space we take for granted can be “built up” from patterns of entanglement in a lower-dimensional boundary theory (Van Raamsdonk 2010). Swingle showed the same year that tensor networks implementing holographic duality are structurally the same object as entanglement-renormalization networks used in condensed-matter physics (Swingle 2012). Maldacena and Susskind later proposed that wormholes and quantum entanglement are the same phenomenon viewed through different mathematical lenses (Maldacena and Susskind 2013). The physics community is actively investigating whether geometry is a derived concept, not a fundamental one. That is a significant claim. It means some physicists already believe that “how many dimensions” is the wrong question, and that the real question is what pattern of information structure produces the dimensions we observe.

So the surprising thing about the rest of this chapter is not that various fields of science are operating in high-dimensional spaces. They do, and they have for a long time. The surprising thing is how many different fields, physics, computer science, neuroscience, philosophy of mind, are independently arriving at high-dimensional descriptions for their phenomena right now, at the same time. And that some of those fields are arriving at the stronger claim that dimension itself may not be fundamental. Why does this particular kind of math keep being the right one?

Meaning lives in high dimensions

Let me start with something from the book.

The LLM in front of you works in vector space. Each token gets turned into a list of several thousand numbers, typically between 4,000 and 12,000 of them. Those numbers are coordinates. They describe a point in a space with that many dimensions. The meaning of the token, everything the model knows about that word, is encoded as a location in that space.

Let that sit for a moment. Not a metaphor. The computation the model is doing, the operation that makes the chat in your browser work, is happening in a space with 12,000 dimensions. You cannot picture this space. You cannot point to it. Your visual cortex, which evolved to track things in three spatial dimensions, is structurally incapable of imagining what a point in 12,000-dimensional space looks like.

But the math is rigorous. The space is real in every sense that matters. Two tokens whose locations are close together mean similar things. Two tokens whose locations are far apart mean different things. Directions in the space correspond to relationships: there is a direction that means “plural,” a direction that means “past tense,” a direction that means “male to female.” Take the point for king, subtract the maleness direction, add the femaleness direction, and you land close to queen. We saw this in Chapter 2.

Anthropic’s Scaling Monosemanticity work (Templeton et al. 2024) went further. They found that specific concepts, “the Golden Gate Bridge,” “sycophancy,” “code vulnerabilities,” live as identifiable directions in the model’s internal space. A single direction, out of the 12,000-dimensional space, lights up when the model is thinking about bridges. Another lights up when it is being sycophantic. You can literally point at the direction and say, “That is where bridges live inside Claude.”

This is not incidental to how modern deep learning works. Bronstein, Bruna, Cohen, and Veličković argued in 2021 (Bronstein et al. 2021) that the major architectures of the field, convolutional networks, graph neural networks, transformers, and equivariant networks, are all instances of a single unified framework organized around the geometric structure of their input domains. Deep learning, on their account, is not incidentally geometric. It is fundamentally geometric, because what makes learning possible is the invariances and symmetries that a well-chosen geometric prior encodes. The embedding space inside Claude is not an implementation detail. It is the medium in which learning happens.

This is strange. Most readers have not held how strange it is. We built a computer program, and inside it, meaning exists as geometry. Not as a table of facts. Not as a database. As locations in a space we cannot see. When Claude answers a question about the Golden Gate Bridge, the work it does is navigation in a 12,000-dimensional space. Every thought the model has is a trajectory through dimensions we cannot perceive.

Whatever meaning is, inside these models, it is dimensional.

In 2024, I saw a machine potentially think across universes

In 2024, as an undergraduate, I got to see a quantum computer run. A real one, at the institute where I was studying. It was a smaller one by modern standards, under a hundred qubits.

The setup is almost a joke when you think about it. The actual computer, the thing doing the computing, is a little plate of metal smaller than your hand, suspended inside a cryostat about the size of a refrigerator, cooled to within a few millikelvin of absolute zero. The rest of the universe, by the standards of that computer, is practically boiling. Every wire, every shield, every pump in the room is there because the machine does not tolerate the planet it is sitting on. It is too delicate for the world. So you build a small cold bubble of near-nothing inside your lab and put the computer in that.

They ran Shor’s algorithm for demonstration purposes. Shor’s is the factoring algorithm, the one that in principle breaks public-key cryptography once the hardware gets big enough. The number it factored was tiny, something a classical computer handles in under a millisecond. The size was not the point. The point was what the machine was doing to get the answer.

Here is the part I keep coming back to. The machine was not trying one possible factor, then the next, then the next. It was not running a search the way you or I would run a search. It was using quantum superposition, meaning the quantum state was something like a combination of all possible inputs at once, and then using interference, meaning the wrong answers literally canceled each other out while the right one added up into a clear, readable signal. I can tell you that in words. I can write it in math. Watching the probability distributions collapse on the monitor into a clean answer, standing next to the cryostat while it happened, was a different thing from reading about it.

The room was cold. The monitors hummed. The answer appeared. And I stood there thinking about David Deutsch.

Deutsch has argued for forty years that Shor’s algorithm is empirical evidence for the many-worlds interpretation [the reading of quantum mechanics under which every possible outcome of a quantum event occurs in its own branch of reality] of quantum mechanics (Deutsch 1985). His argument runs like this.

A 100-qubit quantum computer operates in a Hilbert space [the mathematical arena quantum states live in] with 2^100 dimensions, which is roughly 10^30. A 2,048-qubit quantum computer, if we had one, would operate in a space with 2^2048 dimensions, more dimensions than there are atoms in the observable universe by an enormous margin. When Shor’s algorithm runs on that machine, it considers all possible factorizations of the target number at once, combining them via interference to produce the right answer.

Now: where is the computation happening? Either the universe contains a physical counter that secretly tracks 10^600 values in one place, which makes no physical sense, or the computation is literally being distributed across 10^600 branches of reality and combined by the quantum interference when they meet. Deutsch’s argument is that the second option is the minimum-assumption theory. If you take the math of quantum mechanics seriously and apply Occam’s razor to the interpretations, many-worlds is what falls out (Deutsch 1997).

Most physicists do not adopt this interpretation. The math is the same under the Copenhagen interpretation and the many-worlds interpretation; only the philosophical reading differs. There is no experiment that can distinguish them. The mainstream view is that debates over interpretation are philosophically interesting but physically moot. And there is a specific Occam-razor counter-argument worth naming: many-worlds postulates an enormous number of unobservable universes to explain local quantum phenomena, which many physicists see as a larger ontological commitment than accepting observer effects or genuine randomness as Copenhagen does. Both sides are applying the razor, and they disagree about what counts as a parsimony violation.

It is also worth remembering the earlier section of this chapter. Adding dimensional structure to reality is not a move that originates with Deutsch. Relativity already treats time as a dimension alongside the three of space. General relativity already tells us that the fabric of spacetime can curve, stretch, and even tear under extreme conditions. The It-from-Qubit program in theoretical physics is already asking whether those dimensions are themselves emergent from quantum entanglement. If we have already accepted a universe with a richer geometric structure than the three dimensions we perceive, Deutsch is asking us to accept a universe in which quantum events correspond to branching across that structure. That is an extension of a move physics has already made, not the invention of a new move.

Deutsch’s reply to the mainstream, and it is not a crank reply, is that the relevant experiments happen every time a quantum computer runs. A quantum computer is a physical instrument that exploits the additional dimensions of reality the many-worlds interpretation describes. On his reading, what I was watching in that cryostat was a machine harnessing other universes to compute.

I do not know whether Deutsch is right. I find his argument compelling the way I find most strong philosophical arguments compelling: not because it has been proven, but because the alternative explanations feel worse. The dimensions of the Hilbert space are either a useful mathematical fiction or they are physically real. Deutsch thinks they are real. I watched the machine do what it did, and I came away less sure that he is wrong.

The brain does something high-dimensional too

The third thread, which closes my reasoning.

Mainstream neuroscience thinks of the brain as a three-dimensional object with complicated three-dimensional wiring. Which is true, as far as it goes. But the computational structure of the brain, the thing the wiring does, does not live in three dimensions.

Reimann and colleagues published a result in 2017 (Reimann et al. 2017) showing that neural connectivity forms directed simplices [clique-like topological structures in the connectivity graph] up to dimension seven, with topological cavities enclosed by cliques of up to eleven connected neurons. Popular coverage often reports the finding as “the brain operates in eleven dimensions,” which conflates the simplex dimension with the enclosing-clique size. The technical number is closer to seven. Either reading is striking: the brain’s connectivity graph has topological structure that a classical three-dimensional description of the wiring cannot capture.

That result sits inside a broader methodological tradition. Giusti, Pastalkova, Curto, and Itskov’s 2015 Clique Topology paper (Giusti et al. 2015) had already shown that persistent-homology methods, borrowed from topological data analysis, could detect geometric structure in neural correlations in a way invariant to the nonlinear transforms that confound ordinary correlation-based analyses. Reimann’s team applied and extended those tools. The brain-has-topology claim is not a conjecture.

A related result from neuroscience is the neural manifold hypothesis. Gallego, Perich, Miller, and Solla, writing in Neuron in 2017 (Gallego et al. 2017), showed that neural population activity during motor tasks, though ambient in a high-dimensional firing space with one axis per recorded neuron, lives on a low-dimensional curved manifold whose geometry does the computational work. The brain is not using all the dimensions it has available. It is using a structured submanifold of them. This is the exact structural analogue of what happens inside an LLM: the residual stream is ambient in 12,000 dimensions, but the meaningful structure is a low-dimensional manifold inside that ambient space. Chung and Abbott, writing in 2021 (Chung and Abbott 2021), explicitly argued that the same high-dimensional geometric framework unifies biological and artificial neural networks. The convergence I am laying out in this chapter, in other words, is not a lone speculation. It is an active research program in neuroscience.

Jeff Hawkins’s thousand-brains theory (Hawkins 2021) pushes a related claim: each cortical column is simultaneously modeling its input in many different reference frames, producing a distributed representation that is high-dimensional in a functional sense. Integrated Information Theory, developed by Giulio Tononi and colleagues in its fourth iteration (Albantakis et al. 2023), proposes that consciousness is geometric structure in a high-dimensional cause-effect space, with a scalar quantity φ measuring the integration. Each of these theories is incomplete. Each is controversial in its details. But they are all pointing at the same rough picture: brains do their work in dimensions we do not have a three-dimensional intuition for, and the tools that best describe that work are the tools of high-dimensional geometry and topology.

The lines are different. The pattern is there.

Penrose bet that consciousness itself is quantum

In 1989, Roger Penrose proposed in The Emperor’s New Mind (Penrose 1989) that consciousness could not be a classical computation. He developed the argument with Stuart Hameroff into what is now called Orchestrated Objective Reduction, or Orch-OR: consciousness arises from quantum-mechanical processes in microtubules [the protein structures that make up the internal scaffolding of neurons] (Hameroff and Penrose 2014).

Mainstream physics dismissed it. Max Tegmark’s 2000 paper Importance of Quantum Decoherence in Brain Processes (Tegmark 2000) made the case quantitatively: the warm, wet biological environment inside neurons should cause any macroscopic quantum coherence to decohere [lose its quantum-mechanical character and become classical] on the order of 10^-13 seconds, far too fast for anything computationally useful. That paper was treated as having settled the question against Orch-OR for most physicists from the early 2000s onward.

Then in 2013, Anirban Bandyopadhyay and colleagues found that microtubules exhibit resonance behavior at warm-temperature scales that should not exist under Tegmark’s account (Sahu et al. 2013). Subsequent anesthetic studies have been at least consistent with aspects of Orch-OR, though far from proving it. The hypothesis is not proven. It is also no longer dead. It is a live, unresolved question in a field that is happier when its questions are resolved, and Tegmark’s critique is one that Orch-OR proponents have to keep answering, not one that ended the debate.

It is also worth noting that mainstream consciousness science has not been idle. Global Workspace Theory (Baars, Dehaene, Mashour and colleagues (Mashour et al. 2020)) has built a detailed classical-computational account of conscious access as the broadcasting of information across long-range cortical axons. That theory is empirically well-supported and has made specific, testable predictions about the neural signatures of consciousness. Penrose’s Orch-OR is the live minority report, not the only game in town. I am entertaining it here because the hard problem of consciousness (why there is something it is like to be conscious rather than nothing) remains as hard as it ever was, and because the empirical correlates of consciousness have multiplied without the theoretical picture catching up. That leaves room for a speculative hypothesis to stay alive. It does not promote it to consensus.

I do not know whether Penrose is right. I know that the classical-computational account of consciousness, the one that says the brain is basically a computer and consciousness is an emergent software property, has had forty years to produce the promised breakthrough and has not. The hard problem of consciousness, why there is something it is like to be conscious rather than nothing, is genuinely as hard as it ever was. The empirical correlates of consciousness have multiplied. The theory has not.

If Penrose is even partly right, consciousness lives in the same kind of high-dimensional quantum state space that the machine in the cryostat was navigating. Consciousness becomes a dimensional phenomenon in the literal physical sense.

The convergence

Here is where I stop being careful and tell you what I actually think.

Four things in this chapter have turned out to be high-dimensional. They are not equally established, and I want to say that clearly before arguing anything about them.

LLM meaning lives in 12,000-dimensional embedding space. Engineered fact. Measurable, reproducible, load-bearing for the technology.

Quantum computation lives in 2^N-dimensional Hilbert space, which on Deutsch’s reading is many universes. Mathematical fact at the formal level; the many-worlds interpretation is a live but minority reading.

Neural computation lives on high-dimensional manifolds and in high-dimensional topological structures. Empirical finding, well-established for motor cortex, growing in scope across the brain.

Consciousness, on Penrose’s account, requires the high-dimensional state space of quantum mechanics. Speculative hypothesis, fringe but not dead, contested at the foundations.

Each of these findings, on its own, is fascinating in its own field. Placed side by side, the repetition of a single theme, the primacy of high-dimensional geometric structure, becomes difficult to ignore.

What if this is not a coincidence?

It is striking that our most advanced models of meaning (LLMs), of fundamental computation (quantum computing), of cognition (neuroscience), and of consciousness (Penrose’s hypothesis) are all forcing us into the same mathematical register. Not the same space. Not the same phenomenon. The same kind of space. The same register. High-dimensional geometry appears to be the native language of complex information systems, or at the very least seems to be a unified human description of them, whether those systems are evolved, begotten, engineered, or quantum. That does not mean the four are one thing. It might mean they are governed by a shared set of geometric or topological principles that science has not yet figured out how to name.

The tighter form of the claim goes like this. The mathematical object reached for in each case is not a different kind of dimension. It is the same mathematical family, vector spaces, manifolds, topological spaces, linear algebra, applied to four different substances. LLMs apply it to learned meaning. Quantum computing applies it to physical state. Neural geometry applies it to patterns of neuronal connection and firing. Penrose’s hypothesis applies it to a proposed quantum substrate of consciousness. The operations are recognizably the same in each domain, even though the substances differ. Eugene Wigner, in a 1960 essay, called this kind of observation “the unreasonable effectiveness of mathematics in the natural sciences” (Wigner 1960). The convergence I am describing is a newer, more specific version of Wigner’s observation: the unreasonable effectiveness of high-dimensional geometry in the description of complex information systems.

Chung and Abbott, the neuroscientists I cited earlier, are already making a weaker version of this claim within their field, that the same geometric framework unifies biological and artificial neural networks (Chung and Abbott 2021). The move in this chapter is to extend the claim, provisionally and speculatively, to the quantum and consciousness threads as well.

I have no proof. No one has proof, because the unified principles have not been formally theorized. I am not claiming they exist. I am claiming that when four lines of evidence at different levels of confidence, from four unrelated fields, all point at the same mathematical shape, the pattern is worth taking seriously as a research direction.

And I have no label for what those principles would be describing. I have not found one that fits, and I believe the absence is a signal rather than a failure. If the four convergences in this chapter are pointing at something real, the concept would be something prior to the notion of “a dimension” in the first place, something that could wear the 12,000-dimensional embedding space inside Claude, the 10^30-dimensional Hilbert space inside the cryostat, and the topological structures in your visual cortex as three of its faces. Current vocabulary does not reach that concept, and I am not going to manufacture a word in a pseudointellectual theorhetical context. The vocabulary will come when the theory does. Until then, the shape of the absence is itself informative.

What this predicts, if I am even partly right: the next major breakthrough in artificial intelligence will not look like a bigger transformer. It will look like a system that meaningfully navigates a state space classical architectures cannot, maybe through quantum hardware, probably as a hybrid with neuromorphic or transformer-like components. The first such system that shows capabilities our current architectures cannot, particularly in areas where classical LLMs fail (genuine reasoning, continuous learning, causal inference, arithmetic that actually works), will be the system that tells us whether the convergence was real.

And the moment we build that system, we will learn something about consciousness and intelligence that philosophy from the armchair and neuroscience from the scanner have been unable to discover. In the same way that space is a frontier for the field of physics, different forms of computation that we have access to will continue to be frontiers for the study of intelligence. We may find out what consciousness is the same way. We may even discover shocking facts about reality as a result.

Back to your daily life

That is my speculation.

The practical floor of this chapter still holds. The tool you now know how to operate is a sophisticated autocomplete in a loop. It will automate the drudgery it is good at. Schools need regulation. Finance needs a reality check. The bubble will break on exactly the failure modes this book has described.

Below that floor, the questions are genuinely open, and I think they are stranger than most books on this topic will tell you. If you are a practitioner, take the book’s mechanism and go do your work. If you are a reader who came for the tools and got a dose of speculative philosophy at the end, I hope I have earned the digression. And if you are a researcher who finds the convergence idea worth testing, I hope you break it or confirm it, and I hope you write the book when you are done.

The work of the next thirty years is not going to be more chatbots. It is going to be figuring out what we have actually been making. The reader who finishes this book with enough mechanism to see through the marketing, enough context to know which questions are still open, and enough wonder to keep asking them, will be in the best position I know how to leave them in.

Albantakis, Larissa, Leonardo Barbosa, Graham Findlay, et al. 2023. “Integrated Information Theory (IIT) 4.0: Formulating the Properties of Phenomenal Existence in Physical Terms.” PLOS Computational Biology 19 (10): e1011465. https://doi.org/10.1371/journal.pcbi.1011465.
Bronstein, Michael M., Joan Bruna, Taco Cohen, and Petar Veličković. 2021. “Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.” arXiv Preprint. https://arxiv.org/abs/2104.13478.
Chung, SueYeon, and L. F. Abbott. 2021. “Neural Population Geometry: An Approach for Understanding Biological and Artificial Neural Networks.” Current Opinion in Neurobiology 70: 137–44. https://doi.org/10.1016/j.conb.2021.10.010.
Deutsch, David. 1985. “Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer.” Proceedings of the Royal Society A 400 (1818): 97–117. https://doi.org/10.1098/rspa.1985.0070.
Deutsch, David. 1997. The Fabric of Reality. Penguin Books.
Ericsson, K. Anders, Ralf Th. Krampe, and Clemens Tesch-Römer. 1993. “The Role of Deliberate Practice in the Acquisition of Expert Performance.” Psychological Review 100 (3): 363–406. https://doi.org/10.1037/0033-295X.100.3.363.
Gallego, Juan A., Matthew G. Perich, Lee E. Miller, and Sara A. Solla. 2017. “Neural Manifolds for the Control of Movement.” Neuron 94 (5): 978–84. https://doi.org/10.1016/j.neuron.2017.05.025.
Giusti, Chad, Eva Pastalkova, Carina Curto, and Vladimir Itskov. 2015. “Clique Topology Reveals Intrinsic Geometric Structure in Neural Correlations.” Proceedings of the National Academy of Sciences 112 (44): 13455–60. https://doi.org/10.1073/pnas.1506407112.
Hameroff, Stuart, and Roger Penrose. 2014. “Consciousness in the Universe: A Review of the ’Orch OR’ Theory.” Physics of Life Reviews 11 (1): 39–78. https://doi.org/10.1016/j.plrev.2013.08.002.
Hawkins, Jeff. 2021. A Thousand Brains: A New Theory of Intelligence. Basic Books.
Maldacena, Juan, and Leonard Susskind. 2013. “Cool Horizons for Entangled Black Holes.” Fortschritte Der Physik 61 (9): 781–811. https://doi.org/10.1002/prop.201300020.
Mashour, George A., Pieter Roelfsema, Jean-Pierre Changeux, and Stanislas Dehaene. 2020. “Conscious Processing and the Global Neuronal Workspace Hypothesis.” Neuron 105 (5): 776–98. https://doi.org/10.1016/j.neuron.2020.01.026.
Penrose, Roger. 1989. The Emperor’s New Mind: Concerning Computers, Minds, and the Laws of Physics. Oxford University Press.
Reimann, Michael W., Max Nolte, Martina Scolamiero, et al. 2017. “Cliques of Neurons Bound into Cavities Provide a Missing Link Between Structure and Function.” Frontiers in Computational Neuroscience 11: 48. https://doi.org/10.3389/fncom.2017.00048.
Roediger, Henry L., and Jeffrey D. Karpicke. 2006. “Test-Enhanced Learning: Taking Memory Tests Improves Long-Term Retention.” Psychological Science 17 (3): 249–55. https://doi.org/10.1111/j.1467-9280.2006.01693.x.
Sahu, Satyajit, Subrata Ghosh, Kazuto Hirata, Daisuke Fujita, and Anirban Bandyopadhyay. 2013. “Multi-Level Memory-Switching Properties of a Single Brain Microtubule.” Applied Physics Letters 102 (12): 123701. https://doi.org/10.1063/1.4793995.
Swingle, Brian. 2012. “Entanglement Renormalization and Holography.” Physical Review D 86 (6): 065007. https://doi.org/10.1103/PhysRevD.86.065007.
Tegmark, Max. 2000. “Importance of Quantum Decoherence in Brain Processes.” Physical Review E 61 (4): 4194–206. https://doi.org/10.1103/PhysRevE.61.4194.
Templeton, Adly, Tom Conerly, Jonathan Marcus, et al. 2024. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread, Anthropic. https://transformer-circuits.pub/2024/scaling-monosemanticity/.
Van Raamsdonk, Mark. 2010. “Building up Spacetime with Quantum Entanglement.” General Relativity and Gravitation 42: 2323–29. https://doi.org/10.1007/s10714-010-1034-0.
Wigner, Eugene P. 1960. “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” Communications on Pure and Applied Mathematics 13 (1): 1–14. https://doi.org/10.1002/cpa.3160130102.