Four programs reaching for the same shape

The previous section ended with a prediction. If we are slice observers of a richer geometric object, we should not be surprised when modeling complex systems forces us into geometries richer than the three-dimensional Euclidean default our perception runs on. That prediction was forced by the framing in §02 and §03. The question I want to take up now is whether the prediction has anything to do with what is happening in the sciences right now.

It does. That is what this section is about.

In the last fifteen years, four scientific programs working on entirely different phenomena have, independently, walked into high-dimensional geometric descriptions of what they study. The phenomena are unrelated. The teams do not talk to each other. They publish in different journals, train on different textbooks, and would not agree on what counts as a hard problem. And yet, when each program has tried to take its phenomenon seriously enough to model it well, it has ended up describing meaning, or computation, or activity, or perhaps awareness itself, as something that lives in a space with many more dimensions than three.

Walk through the four. I am not going to dress them up as four versions of the same thing. They are different kinds of finding, with different evidential weight. Three are well-established empirical or formal results inside their fields. One is speculative, contested at the foundations, and likely to be wrong. The point is not that the four converge on the same conclusion. They do not. The point is that the shape of the description, the mathematical family the description belongs to, is the same in each case.

Eugene Wigner, in 1960, wrote a short essay called “The Unreasonable Effectiveness of Mathematics in the Natural Sciences” (Wigner 1960). His observation was that fields with no obvious connection keep stumbling into the same equations. The math one corner of physics had developed for one purpose turned out to be exactly what some other corner needed for a wholly different purpose, and nobody could explain why. Wigner thought this was strange. He was right to.

The four programs in this section are a more specific version of Wigner’s observation. Same family of mathematical operation, applied to four different substances. The family is high-dimensional vector spaces, manifolds [smoothly curved spaces that look flat up close, the way the Earth’s surface looks flat to someone walking across a city], and topological structures [shapes a space keeps even when you stretch and bend it without tearing]. The four substances are learned meaning, in the case of large language models; physical state, in the case of quantum computation; neuronal activity, in the case of population geometry; and a proposed substrate of awareness, in the case of the Penrose-Hameroff hypothesis.

Same family of operation. Different substances. That is the empirical heart of this section. Let me show each.

4.1 LLM embeddings: meaning as location

A large language model, the kind sitting behind ChatGPT or Claude or Gemini, treats every chunk of text it sees as a list of several thousand numbers. The chunks are called tokens [a token is a piece of text, usually a word or part of a word, of the size the model was trained to operate on]. The numbers are coordinates. They place the token at a specific point in a space with as many axes as numbers, typically between four thousand and twelve thousand depending on the model. That space is the embedding space [the high-dimensional vector space in which the model represents every token as a point].

What matters for this section is the geometry, not the implementation.

The geometry has two features that are surprising the first time you meet them, and that stop being surprising only after you have sat with them for a while.

The first feature is that distance in the space tracks similarity of meaning. Two tokens whose coordinates are close together tend to mean similar things. The token for cat lands near the token for kitten. The token for cat lands far from the token for parliament. Nobody told the model that cat and kitten are similar. Nobody hand-coded a similarity table. The geometry emerged from the training process, in which the model was rewarded for predicting which token comes next in a long stream of human-written text. Words that appear in similar contexts ended up near each other, because the same predictive structure that fits one fits the other.

The second feature is that direction in the space tracks relationship. There is a direction in the space that means plural. Add it to the location of cat and you land near cats. There is a direction that means past tense. Add it to the location of run and you land near ran. The most famous example comes from the original word2vec work in the early 2010s, and it is gender. Take the location of king. Subtract the direction that points from man to woman. You land near queen. Meaning, at the level the model operates on, has been encoded as a geometric configuration with both a way of measuring how close two tokens are, and a set of directions you can travel along that mean specific things.

Someone who has not seen this before should pause. We built a computer program. Inside the program, meaning lives as locations and directions in a space we cannot picture and have no body for. That is not a metaphor. It is the literal mechanics of how the program runs.

Here is what that looks like in practice. Suppose you ask your LLM of choice what color the sky is over Lisbon at noon. The model takes your question, chops it into tokens, and replaces each token with its several-thousand-number coordinate. From the first step on, your sentence is no longer a sentence; it is a list of points in a twelve-thousand-dimensional space. Every step of generating the answer is a navigation problem inside that space. The model picks the next token by computing a direction from the current state, walking a small distance, and reading off whichever token is closest. Blue lands close. Cerulean lands close. Parliament lands far. The answer comes out blue not because the model retrieved the fact but because the geometry of the space pulled the next-token machinery toward that region. There is no separate fact-store doing the work. The geometry is the mechanism.

The geometry is rich enough to carry concepts as identifiable directions, not just words. Anthropic’s Scaling Monosemanticity result (Templeton et al. 2024) showed this directly. The team trained a probe to look inside Claude’s internal activations and report what each direction in the space corresponded to. Some of the directions matched recognizable concepts: the Golden Gate Bridge, the disposition of being sycophantic, recognizable categories of code vulnerability. A single direction, out of the model’s twelve thousand or so, lit up when the model was thinking about bridges. Another lit up when it was being sycophantic. The concept and the direction are not the same thing, but the direction is where the concept lives, in the same way the Golden Gate Bridge has a location even though the location is not the bridge.

The geometric character of this is not incidental to how modern deep learning works. Bronstein, Bruna, Cohen, and Veličković argued in 2021 (Bronstein et al. 2021) that the major architectures of modern machine learning, the convolutional networks that recognize images, the graph networks that handle molecules and social networks, the transformers that handle language, can all be unified under a single framework organized around geometric structure. On their account, deep learning works at all because each architecture is built to respect the geometric and symmetry structure of the substance it learns from. Images have rotational structure; convolutional networks bake it in. Language has compositional structure; transformers bake it in. The geometry is not a side effect of the implementation. It is the medium in which the learning happens.

That is the first program. Meaning, when modeled by the most successful systems we have built, lives as location and direction in a high-dimensional vector space. The empirical fact is settled. The geometry is doing the work.

4.2 Quantum computation: Hilbert space and Deutsch’s many worlds

The second program is older as a piece of mathematics, younger as a piece of working hardware, and more contested in its interpretation than the first.

A quantum computer, when it runs, operates inside a mathematical space called Hilbert space [the space in which quantum states live; for this section the load-bearing feature is that its dimension grows extremely fast with the number of physical components]. The dimension of the Hilbert space for a quantum computer with N qubits [a qubit is the quantum equivalent of a classical bit, a unit of quantum information that can be in a combination of zero and one rather than just one or the other] is two raised to the power of N.

That growth rate is astounding. Two qubits give you a four-dimensional Hilbert space. Ten qubits give you a thousand-dimensional one. A hundred qubits, the size of a smaller research-grade quantum computer of the kind I got to see to as an undergraduate, give you a Hilbert space with about \(10^{30}\) dimensions. That number is roughly a thousand times the number of stars in the observable universe. A two-thousand-qubit machine, if we had one and could run it, would operate in a Hilbert space whose dimension exceeds the number of atoms in the observable universe by an enormous margin. Each qubit you add doubles the dimension. The growth is not large. It is cosmically large within a few dozen components.

The exponential growth is what gives quantum computation its theoretical power. It is also what makes the question of where the computation is happening hard to answer.

Shor’s algorithm is the quantum algorithm for factoring large numbers. It is the canonical demonstration that a quantum computer can do something a classical computer cannot do at any reasonable speed. Here is how to picture what it does, without the math. Imagine you want to factor a number, say a 50-digit number with two prime factors. A classical computer would try possible factors one at a time, or use a clever sieving method that still examines candidates in some sequence. Shor’s algorithm does something else. It puts the machine into a quantum state that is, in effect, a combination of all possible candidate inputs at once. Then it lets the candidates interfere with each other, the way two water waves can flatten each other when they meet at the right phase. The wrong candidates cancel. The right one, the actual factor, adds up into a clean signal that the machine reads at the end. The space inside which all this canceling and adding happens is the \(10^{30}\)-dimensional Hilbert space.

The mathematics is settled. The interpretive question is not.

David Deutsch has argued for forty years (Deutsch 1985, 1997) that the Hilbert space dimensions are a literal description of branching reality. On his reading, the quantum computer is exploiting a real geometric structure of the universe in which every quantum event branches the world into a separate copy for each outcome. Shor’s factoring is not happening in one place. It is being distributed across the branches and recombined when they interfere. On Deutsch’s reading, the running quantum computer is empirical evidence that the many-worlds interpretation [the reading of quantum mechanics on which every possible outcome of a quantum event occurs in its own branch of reality] is the right one. He puts the question this way: when a hundred-qubit machine factors a number using \(10^{30}\) candidate computations in superposition, where are those computations being done? Either reality contains a hidden \(10^{30}\)-slot ledger we cannot otherwise see, or the work is genuinely being distributed across \(10^{30}\) branches of the world. Deutsch finds the second reading more parsimonious.

This is a minority view. Most working physicists do not adopt it. The predictions of quantum mechanics are the same under several different interpretations, and no experiment has yet been able to distinguish them. The Copenhagen interpretation, the de Broglie-Bohm pilot wave, consistent histories, and many-worlds all yield the same observable outcomes. There is also a specific Occam-razor counter-argument that the section needs to acknowledge. Many-worlds postulates an enormous proliferation of unobserved universes to explain local quantum phenomena, and many physicists see that postulation as a larger ontological commitment than accepting that quantum mechanics has features we have not fully explained. Both sides apply Occam’s razor and disagree about what counts as a parsimony violation. The argument is alive, and this paper is not the place to settle it.

What the paper does settle on is the structural fact, which is the same regardless of interpretation. Quantum computation operates in a space whose dimension grows exponentially with the number of physical components, and that space is doing work the components themselves cannot do. The dimensions are either physical, in the sense Deutsch argues for, or mathematically physical, in the sense that the space is what the math operates on even if its dimensions are not separate worlds. Either reading places the operation of the machine in a high-dimensional geometric arena. The question of whether the dimensions are real in the strongest sense is open. The presence of the high-dimensional arena, and the reality that computation happens inside it, is not.

I lean toward Deutsch’s reading without endorsing it.

That is the second program. Physical computation, when implemented at scales that exploit quantum mechanics, lives in a high-dimensional Hilbert space, and the most natural reading of the mathematics is that the dimensions are doing real work.

4.3 Neural population geometry: brain on manifolds

The third program looks inward, at the only computational system humans had access to before they learned to build their own.

Mainstream neuroscience treats the brain as a three-dimensional object with complicated three-dimensional wiring. As far as the wiring goes, this is correct. Neurons are physical objects in physical space, and the connections between them live in physical space too. The wiring map is a three-dimensional graph, and that graph fits inside a skull.

What lives on top of the wiring map is a different story.

When neuroscientists record the activity of large populations of neurons firing during a cognitive or motor task, they get back a high-dimensional time series. Each neuron is one axis. A recording of a thousand neurons over several seconds produces a trajectory through a thousand-dimensional space, where each point on the trajectory is the firing pattern of all the neurons at one moment. That space is the firing-rate space [the space in which each axis is the firing rate of one recorded neuron, so a population of N neurons defines an N-dimensional space], and it is the natural mathematical home for population-level neural activity.

Here is what that means concretely. Imagine a monkey reaching for a banana. A researcher has a thousand electrodes recording from motor cortex while the reach happens. At time zero, before the reach, all thousand neurons are firing at some baseline rate. That gives you a single point in a thousand-dimensional space. Ten milliseconds later, some neurons have ramped up, others have ramped down. New point. As the reach unfolds, the point traces a curve through the thousand-dimensional space. The whole reach is one trajectory. Different reaches, to different banana positions, trace different trajectories. The brain’s behavior, at the level of population activity, is the geometry of these trajectories.

The empirical finding that has reshaped this part of neuroscience over the last decade is that, although the firing-rate space is enormous, the activity does not fill it. The trajectories the brain takes through the space are confined to a much smaller region, a curved surface or low-dimensional manifold [the few-dimensional curved sub-space the high-dimensional activity confines itself to], embedded inside the high-dimensional ambient space. Gallego, Perich, Miller, and Solla published the canonical version of this finding in Neuron in 2017 (Gallego et al. 2017). Recording from cortex during motor tasks, they showed that the population activity, although ambient in many hundreds of dimensions, sits on a curved surface only a few dozen dimensions across, and the geometry of that surface matches the structure of the task being performed. The thousand-dimensional firing-rate space turns out to be a near-empty volume; the brain is using a small structured patch of it.

This is a clean structural analogue to what happens inside a large language model. The model’s residual stream [the running activation pattern that flows through a transformer as it processes a sequence] is ambient in twelve thousand dimensions. The meaningful activity, the part that does the work of language understanding, lives on a much lower-dimensional manifold inside that ambient space. Chung and Abbott, writing in Current Opinion in Neurobiology in 2021 (Chung and Abbott 2021), made the point explicitly. They argued that the same high-dimensional geometric framework unifies biological and artificial neural networks. Meaning, on their account, is what neural activity geometry encodes, whether the neurons are biological or simulated. The convergence I am laying out in this section is, at the level of these two pillars, not a speculation. It is a research program already being run inside computational neuroscience. Chung and Abbott are not making a metaphysical claim. They are pointing out that the mathematical tools are the same.

A second, stronger neuroscience result needs to be introduced carefully, because its popular framing is wrong.

Reimann and colleagues, working with the Blue Brain Project’s reconstructions of cortical microcircuits, published a paper in 2017 (Reimann et al. 2017) looking at the connectivity graph of cortex: which neurons connect to which. They were not asking how the neurons fire. They were asking what shape the wiring forms. They found that the connectivity contains simplicial complexes [topological structures built out of points, edges, triangles, tetrahedra, and their higher-dimensional analogues, glued together to form a single shape] of dimension up to seven. The complexes enclose topological cavities [higher-dimensional analogues of the hole inside a donut, a missing region the surrounding structure wraps around], with cliques of up to eleven mutually-connected neurons forming the boundaries.

The popular framing of this paper is that the brain “operates in eleven dimensions.” That is a misreading. The dimensions in question are topological, not spatial. They describe the shape of the connectivity graph in the language of algebraic topology. They are not a claim that the brain occupies eleven directions of physical space. Either reading of the technical number is striking, but the topological reading is the one the paper argues for, and the topological reading is the one that matters for this section’s claim.

The methodological lineage matters too. Giusti, Pastalkova, Curto, and Itskov’s 2015 Clique Topology paper (Giusti et al. 2015) had already shown that persistent homology [a method from topological data analysis that detects high-dimensional shape in a dataset by tracking which features survive at multiple scales of resolution] could detect geometric structure in neural correlations even when ordinary correlation analysis missed it. The Reimann team extended these tools and applied them to anatomical connectivity rather than just functional correlations. The picture is not a single result. It is a maturing methodology in a maturing subfield.

Mainstream consciousness science has its own thread reaching toward high-dimensional structure, worth noting without leaning on. Integrated Information Theory, in its current 4.0 formulation (Albantakis et al. 2023), proposes that consciousness is geometric structure in a high-dimensional cause-effect space, with a scalar quantity called phi measuring its integration. IIT is mathematically precise, controversial in its specific predictions, and reaches for high-dimensional structure of a different kind from the population-geometry program. It is a separate program pointing in the same direction, not a pillar the convergence argument leans on.

That is the third program. The brain, when its activity and connectivity are studied with the right tools, turns out to do its work on high-dimensional manifolds and inside topological structures that classical three-dimensional descriptions of the wiring miss.

4.4 The substrate of consciousness: Penrose-Hameroff

The fourth program is the speculative one.

Penrose-Hameroff is the most contested item on this list. It is fringe, in the sense that the majority of working physicists and neuroscientists do not believe it. If it turns out to be wrong tomorrow, the convergence I am laying out in this section becomes three pillars instead of four, and the rest of the paper’s argument survives unchanged. The structural reading of meaning does not require this pillar to land. I include it because the proposal does point in the same mathematical direction as the others, and because the question it raises is genuinely open in a way I do not want to pretend is settled.

In 1989, Roger Penrose, the mathematical physicist who shared the 2020 Nobel Prize in Physics for his work on black holes, proposed in a book called The Emperor’s New Mind (Penrose 1989) that consciousness could not be a classical computation. His argument was largely a Gödel-style argument from logic and mathematics, and it pointed at the conclusion that whatever consciousness is, it requires a non-classical substrate. The proposal got more concrete over the next decade, as Penrose worked with the anesthesiologist Stuart Hameroff on a specific physical hypothesis. They proposed that consciousness arises from quantum-mechanical processes inside microtubules [the tubular protein structures that form the internal scaffolding of neurons; in this proposal, they would also be the physical seat of awareness]. Penrose and Hameroff developed the proposal into Orchestrated Objective Reduction, or Orch-OR, summarized in their 2014 review (Hameroff and Penrose 2014).

The proposal is genuinely strange, and worth meeting on its own terms before any rebuttal. Microtubules are real. They are visible under an electron microscope. Every neuron has them, and so does every other animal cell, where they perform structural and transport jobs out of any cell biology textbook. Penrose and Hameroff are claiming that, on top of those everyday jobs, microtubules are running quantum computations whose outputs are the moments of conscious experience. That is a much stronger claim than “microtubules matter for the brain.” It is a claim that the texture of awareness, the felt difference between hearing a violin and hearing a kettle, has a substrate inside structures every textbook treats as scaffolding. Most physicists found this implausible on physical grounds the moment they heard it. Penrose’s reply was that the implausibility was the point: consciousness itself is implausible on classical grounds, and the mismatch is the clue.

The mainstream physics response was sharp and quantitative. Max Tegmark’s 2000 paper Importance of Quantum Decoherence in Brain Processes (Tegmark 2000) made the case in numbers. Quantum coherence, the property that any macroscopic quantum computation requires, dies fast in warm and noisy environments. Tegmark calculated the decoherence time [the timescale on which a quantum system loses its coherent quantum behavior and becomes effectively classical, as ambient noise scrambles its phase] inside the warm wet biological environment of a neuron and got a number on the order of \(10^{-13}\) seconds. That is one ten-trillionth of a second, much faster than any timescale at which neural activity matters; neurons fire on the scale of milliseconds, ten orders of magnitude slower. If Tegmark’s calculation is right, the warm wet brain destroys quantum coherence so fast that no quantum computation worth the name could possibly run inside it. For most physicists, Tegmark’s paper was treated as having settled the question against Orch-OR from the early 2000s onward.

Then in 2013, Anirban Bandyopadhyay and colleagues found something Tegmark’s account did not predict. Their paper (Sahu et al. 2013) reported resonance behavior inside microtubules at warm temperatures, with electronic transport features that should not exist if Tegmark’s decoherence calculation captured everything that mattered. The result was experimental, not theoretical. It did not prove Orch-OR. It did show that microtubules support some kind of coherent quantum behavior on biologically relevant scales, which is exactly the kind of finding the Tegmark calculation was supposed to rule out. The hypothesis was not proven. It was no longer dead.

As of 2026, the question is genuinely unsettled. Some of Bandyopadhyay’s results have been replicated, others contested. Anesthetic studies have found patterns at least consistent with parts of Orch-OR, though far from confirming the whole picture. Tegmark’s critique has not been retracted, and Orch-OR proponents have to keep answering it rather than treating it as resolved. The field is in a state where the proposal is alive enough to be taken seriously by some specialists and disputed strongly by most, with neither side able to land a knock-out blow.

Nothing in what follows requires Orch-OR to be true, or even likely. The structural feature that matters for the convergence is narrower: if Penrose and Hameroff are right, consciousness lives in the same kind of high-dimensional quantum state space that the machine in the cryostat was navigating. The geometry of awareness, on that hypothesis, is the geometry of Hilbert space. That puts the substrate of consciousness in the same mathematical family as the other three programs, and that is the only feature of the hypothesis the section needs.

It also matters that mainstream consciousness science has not been idle. Global Workspace Theory, developed by Bernard Baars and given its modern form by Stanislas Dehaene, George Mashour, and their collaborators (Mashour et al. 2020), offers a classical-computational account of conscious access. On this view, an unconscious thought becomes conscious when it gets broadcast across long-range cortical axons to a wide network of brain regions. The theory makes testable predictions, has accumulated substantial empirical support, and does not require any quantum-substrate reading. It is the live mainstream alternative to the Penrose-Hameroff line. The reason I am entertaining Orch-OR alongside it is not that I prefer the speculative theory. It is that the hard problem of consciousness, the question of why there is something it is like to be conscious rather than nothing at all, remains as hard as it ever was, and the empirical correlates of consciousness have multiplied without the theoretical picture catching up. That leaves room for a speculative hypothesis to stay alive. It does not promote it to consensus.

On the Penrose-Hameroff reading, then, consciousness has a substrate in the high-dimensional state space of quantum mechanics. Whether the substrate is literally inside microtubules or is an emergent property of some computational geometry is a different story.

One mathematical family, four substances

Four programs. Four substances. One mathematical family.

Large language models encode meaning as locations and directions in a vector space of several thousand dimensions. The geometry is a working empirical fact. Quantum computation operates in a Hilbert space whose dimension grows exponentially with the number of qubits; the mathematics is settled even when the interpretation is not. Neural population activity lives on low-dimensional curved manifolds inside high-dimensional ambient spaces, and neural connectivity carries topological structures of high dimension that three-dimensional wiring diagrams miss. The substrate of consciousness, on the Penrose-Hameroff hypothesis, is a high-dimensional quantum state space; the hypothesis is fringe, and the rest of the paper’s argument does not depend on it.

What is shared is not the substance, and not the conclusion. What is shared is the mathematical family the description belongs to. Vector spaces with high dimension. Manifolds embedded in larger spaces. Topological structure that classical descriptions miss. Linear algebra and the geometry that goes with it, applied to whatever the system happens to be made of. This is the Wigner observation in tighter form. Wigner noticed that the same equations keep showing up across unrelated fields. Here, the same family of geometric and topological apparatus keeps showing up across unrelated substances: learned meaning, physical state, neuronal activity, proposed substrate of awareness. The pattern is not “these four things are analogous to each other.” The pattern is “the same kind of mathematical object is what each of these things turns out to need, when you push on it hard enough to model it well.”

That is a striking observation, and the natural critique is that it might not mean anything. High-dimensional vector spaces are the methodological tool of our era. Maybe the convergence is selection effect, not signal. Maybe it tells us more about the toolkit available to twenty-first-century modelers than about the things being modeled.

That critique is the one the next section has to take seriously. If the convergence is merely fashion, it is interesting but not load-bearing. If it is forced by what is being modeled rather than by who is doing the modeling, then it is something else. The next section is the place where that distinction gets made.