After the universe
This paper started with a popular but “pop” physics post on social media. Something to the effect of “from a photon’s perspective, no time passes between emission and absorption…” A familiar line. Something every undergraduate physics student hears in their third or fourth year. The kind of thing that gets shared and re-shared and that usually never gets a second glance.
The vapid comment led somewhere unexpected when I revisited the complex topology of our reality. A week later this paper is what came out. Most of what is in it has nothing to do with photons or relativity. Photons were just where the door was. What was on the other side turned out to be more relevant than I could have expected.
What came out is a number of thought experiments and one idea I think is worth taking seriously.
Just as early engineers used nature to inspire their designs before they had proper simulations and the knowledge required to perfect them, I urge the machine learning industry to do the same when it comes to the universe and models of it:
Models of “meaning” should be built after the universe they are modeling.
The word meaning is a human concept.
A modern language model [think ChatGPT] represents the meaning of a word as a long list of numbers, usually somewhere between four thousand and twelve thousand of them, treated as coordinates in a flat space. The math is linear algebra. We have had a century of practice with it. The optimization runs smoothly on top of it. The hardware is built for it. Convenience is a fine reason to pick a tool, and a flat space is the convenient tool.
The problem is that the tool is wrong about what it is being used to model.
Meaning is not a free-standing mathematical object. It is something we create, something that happens in the universe, performed by physical beings whose physical processes occur inside whatever the universe actually is. And the universe, on a hundred years of physics, is not a flat geometric space. It is curved. It is layered. It contains regions where geometry itself appears to be emergent from something more primitive. The systems that try to capture what we mean sit inside that richer object, made of stuff that obeys its rules, and any such model is in the end a model of something the universe or what it is contained in more broadly is doing. So the modeling tool should be after the universe, not after our defaults.
That sentence is the spine of the paper. I will earn it across the sections that follow, but it is worth flagging here that the relationship I mean is containment, not analogy. I am not saying that because spacetime is hyperbolic, embeddings should be hyperbolic. That inference is sloppy. Whatever meaning is, it occurs inside the host. The host’s regimes constrain the substrate it runs on. A model that captures it honestly has to make peace with the host’s actual shape, not with the easiest shape to write code for.
The argument that gets to that claim crosses several disciplines, all of which I have studied, none of which I have mastered to a world-class degree. For that reason, my goal is to present a defensible theoretical position built on the work of others and the scientific consensus as I read it. The chain starts in special and general relativity. It passes through an empirical observation that several independent scientific programs, working on very different phenomena, have arrived at the same time at high-dimensional geometric descriptions of what they study. It picks up a philosophical position about meaning that is older than the field of machine learning. It lands on a research direction for the systems that come after the current generation of large language models. The argument is the journey, and the journey is what the paper is for.
I will mark which steps in the chain are forced by physics, which are empirical findings whose interpretation is contested, and which are philosophical proposals the reader can take or leave. Where the right word for an idea does not exist, I will say so rather than coin one, because the absence of a word is itself informative.
This paper is not an attempt to settle questions about what ultimately exists. Philosophers of physics have argued for centuries about whether the universe is best described as a substance with properties, a network of relations with no underlying things, a totality of mathematical structures, or a continuously branching set of quantum realities. The structural picture I lay out below is compatible with several of these readings. Asking which of them is correct is a different paper. I engage these positions where they clarify what the structural claim is and is not, and I decline to choose where the choice does not affect the argument. Whether things “really” exist, or what “really” should be taken to mean, is on the view I take here a question downstream of the structural fact, not upstream of it.
Nor is it an engineering paper. There is an implementation that ought to follow from the argument, a tiered substrate in which curved, topological, and relational regimes each handle the aspect of meaning they are best suited to. I am building it. It takes time. Learning PHD level math unfortunately takes time. I sketch the shape of that implementation toward the end, but the version that ships and beats current architectures on real tasks is a separate paper. This one is the argument that the building is worth doing.
Let’s start by talking about physics.