The convergence is not a coincidence

The previous section laid four findings side by side. Large language models represent meaning as locations in a high-dimensional vector space [a space whose points are described by long lists of numbers, with each number an independent coordinate]. Quantum computation runs in a Hilbert space [the mathematical arena that quantum states live in, whose dimension grows as two raised to the number of qubits]. Neural population activity sits on curved low-dimensional manifolds embedded in high-dimensional firing-rate spaces. Penrose’s proposal places the substrate of consciousness inside the same kind of high-dimensional quantum state space. Four programs, four substances, one mathematical family.

Before I argue that this convergence means something, I owe the strongest version of the objection that it does not.

The critique, at full strength

High-dimensional vector spaces are the methodological tool of the era. Linear algebra is the mathematics that runs on graphics cards [the parallel-processing chips originally built for video games and now used for almost all modern machine-learning computation]. Optimization on long lists of numbers is the only kind of computation we have learned to do at industrial scale. The dimensions are cheap. The tools are mature. So when an LLM lab needs to encode word meaning, they reach for high-dimensional vectors. When a quantum theorist writes down the state of a many-qubit system, they use Hilbert space. When a neuroscientist tries to make sense of firing patterns from a multi-electrode array, they reach for dimensionality reduction [a family of methods for finding low-dimensional structure inside data that originally lives in many dimensions]. Same toolbox, four hands.

On this reading, the pattern says nothing about the world. It says something about the era. The convergence is what happens when you hand the same wrench to four different mechanics and watch them all turn bolts.

There is a cleaner historical parallel that sharpens the worry. Consider calculus.

In the late 1600s, both Isaac Newton in England and Gottfried Leibniz in Germany worked out, independently, the mathematics of how quantities change. Newton was trying to describe how a planet moves under gravity. Leibniz was working on the geometry of curves. The two men never collaborated, used different notation, and would later fight bitterly over priority. But the mathematical machinery they each arrived at was the same machinery. Both of them found derivatives. Both of them found integrals. Both of them found the relationship between the two that we now call the fundamental theorem of calculus.

Once calculus existed, it became the working language of physics. Mechanics. Electromagnetism. Fluid dynamics, statistical mechanics, general relativity, quantum field theory. Each runs on the same machinery. You could write a book on the unreasonable effectiveness of calculus and the chapter list would be long.

You can read the pattern two ways. One way: calculus is hidden in nature, and the convergence is the discovery of it. The other way: rate-of-change is a real feature of physical processes, calculus is the tool that handles rate-of-change well, and so anyone modeling processes will reach for it. The second reading is the one historians of physics defend. It does not posit a Platonic calculus behind everything. It posits a feature of the world, and a tool that fits the feature.

Apply that to my four programs and the worry becomes pointed. High-dimensional geometry is calculus’s twenty-first-century cousin. It is a useful tool. It recurs because of where the field’s bench tools sit, and because of which problems are tractable when those tools are brought to bear. Reading anything deeper into the convergence is reading tea leaves.

This is a strong critique. Part of it deserves to be conceded before the rest is argued against.

What I am willing to give up

A few things go down without a fight.

Publication bias is real, and it shapes what gets called convergent. The four fields I picked are four out of many. Cognitive science, formal linguistics, dynamical systems, and category theory do not reach for high-dimensional vector geometry as their first move. This could either be they haven’t discovered it or it doesn’t exist. If I had picked those, I would tell a different story. The selection is a choice, and it deserves to be flagged as one.

It is also true that high-dimensional vector spaces are flexible. With enough dimensions, you can fit almost any structure inside them. So the bare fact that all four programs can be described in such spaces is, on its own, weak evidence that they share anything deep.

I take both. The defense I am about to mount narrows the worry. It does not eliminate it.

The internal-pressure argument

Here is what the selection-effect critique gets wrong.

In each of the four programs, the geometric structure is not a tool the field reached for. It is a constraint the field was forced into by the phenomenon being studied. The fields did not bring high-dimensional geometry to the problems. The problems brought high-dimensional geometry to the fields.

Take them one at a time.

LLM embeddings. Early word-embedding work in the 2010s tried representing meaning in a few dozen dimensions. It worked badly. As the dimension count rose, the quality of the representations improved well past the point where a flexibility argument predicted diminishing returns (Templeton et al. 2024). The reason is concrete. To encode the relationships among a hundred thousand tokens [chunks of text, usually a word or part of a word], you need enough room for every distinction those tokens carry. A space that puts “king” and “queen” in different positions, “king” and “monarch” in similar ones, and “king” and “stapler” far apart, and does this consistently across the whole vocabulary, has to be a few thousand dimensions wide, not a few dozen. The modelers did not pick the dimension count. The volume of relationships the representation had to carry forced it. Lower-dimensional embeddings empirically lose distinctions, which show up as worse performance on downstream tasks. The geometry survived an empirical filter. The field did not default to it.

Hilbert space. The dimensional explosion of quantum mechanics is not a modeling choice. It is a fact about how quantum systems combine. Two qubits do not have four states because the formalism has four slots to fill. They have four states because the system has four physically distinguishable configurations, and any honest description has to track them. Twenty qubits have a million states. A hundred qubits have ten to the thirty, more configurations than there are atoms in the Earth. A theory that did not respect this growth would predict the wrong probabilities in the laboratory. Hilbert space is settled physics, the way the Minkowski metric of §02 is settled physics. The geometry is forced by the experimental record.

Neural manifolds. The high-dimensional geometric description of brain activity was not imposed by neuroscientists. It was discovered in the data. Gallego, Perich, Miller, and Solla recorded from many electrodes in the motor cortex of monkeys reaching for targets (Gallego et al. 2017). They asked where, in the high-dimensional space whose axes are the firing rates of individual neurons, the population activity sits. The answer was not “everywhere.” The activity does not fill the space available to it. It lives on a curved surface inside that space, and the curved surface is where the computation happens. Reimann and colleagues looked at the connectivity graph of cortical microcircuits (Reimann et al. 2017) and found topological structures, cliques and cavities, that have no clean three-dimensional spatial description. The researchers did not bring topology to the data. The data forced topology onto the analysis. Chung and Abbott’s 2021 review (Chung and Abbott 2021) argued that the same geometric framework is unifying biological and artificial neural networks, not by design, but because the empirical structure happens to share that shape.

Orchestrated objective reduction. This is the speculative leg, as flagged in §04. Penrose and Hameroff did not propose a high-dimensional structure as a useful tool for their consciousness theory. They proposed it as constitutive: the claim is that consciousness requires a high-dimensional quantum state space to occur at all, because classical computation cannot do what consciousness does (Penrose 1989; Hameroff and Penrose 2014). If the proposal is right, the geometry is not a description of the substrate of consciousness. It is the substrate of consciousness. If the proposal is wrong, nothing else in the chain is affected, and the convergence is three pillars instead of four. Either way, this fourth case is not a tool reached for. It is structure proposed as the thing itself.

The pattern across the four cases is the same. The geometric structure is the residue of a constraint imposed by the phenomenon, not the imprint of a methodology imposed by the field. The selection-effect critique reads the convergence as four people swinging the same hammer. The internal-pressure reading is that four different problems each demanded a hammer, and the hammer that fits them is recognizably the same shape.

The deeper inversion

The calculus parallel cuts the other way once you press on it.

The selection-effect critique uses calculus to argue that recurring tools say nothing about the world. But that is not what the calculus example shows. Calculus recurs across physics because rate-of-change is a real feature of the world. The world has processes. Processes have rates. Any modeling framework that gets traction on processes has to handle rates. The convergence on calculus is not selection effect. It is the world being a place where rate-of-change is a structural feature, and the tools that respect that feature are the tools that work. The tool recurs because the feature is real.

That is the move I want to make for high-dimensional geometry. If the convergence across the four programs is forced by what each is studying, rather than imposed by the era’s bench tools, the convergence is telling us something about what is being modeled. The four programs are modeling things that share a structural feature: high-dimensionality of the relationships among parts. The geometric tools recur because the feature is real, and any modeling framework that gets traction has to respect it.

This is the argument Eugene Wigner made in 1960, in a more general form (Wigner 1960). Wigner was a physicist who had spent his career watching mathematical structures invented for one purpose, often for no purpose beyond a mathematician’s curiosity, turn out decades later to describe physical phenomena with extraordinary precision. Group theory was a branch of pure algebra before it became the language of particle physics. Riemannian geometry was a Victorian curiosity before Einstein needed it for general relativity. Wigner asked why this kept happening. He did not say the mathematics caused the phenomena. He said the recurring fit was a fact, and that the fact called for explanation.

The convergence I am arguing for is a more specific version of Wigner’s observation. Where Wigner pointed at mathematics in general, I am pointing at one corner of it: the corner that handles relationships of similarity, hierarchy, and high-dimensional structure. I am noting that this corner is the one four contemporary scientific programs, working at scales from the inside of a neuron to the structure of meaning to the state of a quantum computer, have been forced into by what they are studying. This is not a new observation about science. It is an extension of an old one. Wigner asked why mathematics keeps fitting. I am asking, more narrowly, why this particular shape of mathematics keeps fitting in this particular set of contemporary fields.

Where the defense stops

The defense buys less than it might look like.

What it buys is the selection-effect critique, narrowed. The critique works only if the geometric structure in each field is a tool the field elected to bring. The internal-pressure argument shows that in each of the four cases, the structure was extracted from the data, the formalism, or the proposal itself. Not picked off the bench.

What it does not buy is the absence of a residual worry. There is an open question about whether high-dimensional geometry is the unifying principle the convergence points at, or only a useful one. The four fields might converge on geometric structure because geometric structure is what we currently have language for. Some deeper principle, for which we do not yet have a name, might be what they are tracking underneath. In that case, “high-dimensional geometry” is the closest approximation we have built so far.

The residual worry is correct, and I do not have an answer to it. The case for the convergence being meaningful is strong. The case for it being meaningful in exactly the way I have described, with high-dimensional geometry as the principle rather than a placeholder for it, is weaker. The rest of the paper has to keep that distinction visible.

From convergence to identity

What survives the defense is this. The four programs are not a methodological coincidence. Each was forced into geometric structure by what it was studying. The convergence is the residue of internal pressure in four substances, not a fashion of the era. At minimum, the four substances share a structural feature deep enough that any honest modeling has to respect it.

Once that is settled, the question changes shape. It is no longer whether the convergence is a coincidence. It is what meaning has to be, that modeling it forces this structure. That is the question §06 takes up. The answer I will argue for is that meaning is structure: the same structural reality the universe is, viewed from inside by the structural beings that model it. That is a strong claim. The next section earns it.