All Models Must Die
I’ve been thinking about something that connects LLMs, biological brains, and a question that longevity researchers probably don’t want to hear.
It started with a simple observation that others have pointed out before me - that reasoning degradation during inference and catastrophic forgetting during training might actually be the same problem wearing different masks. In both cases, what you’re really looking at is a structure that can no longer hold itself together under the weight of its own accumulated contradictions.
During inference, as a conversation grows longer and conflicting information piles up, the model starts losing its grip on logical consistency. It’s not that the context window ran out. It’s that unresolved contradictions accumulated faster than the model could manage them. Experiments have shown that when you externally organize those contradictions - sorting what was true before from what is true now - performance stabilizes dramatically. The model wasn’t broken. It was drowning.
During training, the same thing happens at a deeper level. When you teach a model something new that conflicts with what it already knows, it doesn’t gracefully update. It overwrites. And when that new information has downstream dependencies - things that were true because the old thing was true - the whole web of related knowledge needs to be revised. The model can’t do this cleanly. Coherence breaks down. We call this catastrophic forgetting but maybe a more honest name is structural collapse.
But maybe, this isn’t just an LLM problem. This is a problem for any fixed architecture that learns continuously. Including the one sitting inside your skull.
Think about what happens to the human brain over a lifetime. You spend decades accumulating knowledge, beliefs, mental models, and habits. Every new thing you learn has to coexist with everything you already know. Sometimes the new stuff directly contradicts the old stuff - your understanding of a friend changes, a scientific fact you learned in school gets revised, the way the world works shifts under your feet. Your brain has to integrate all of that without a clean restart.
And for a while it does this remarkably well. But not forever.
Age-related cognitive decline doesn’t always start with neurons dying or plaques forming. It often starts with exactly the symptoms you’d predict from this framework. - difficulty integrating new information with old knowledge, confusion when something recently changed (a new phone number replacing one you had for years), trouble holding complex reasoning chains together. It looks a lot like what happens to an LLM when contradictions accumulate beyond its capacity to manage them.
We tend to assume that if we solve the physical causes of death - cancer, heart disease, organ failure - we could theoretically live forever. But what if the brain has a fundamental expiration date that has nothing to do with physical deterioration? What if the architecture itself has a finite capacity to manage the ever-growing complexity of its own knowledge? Not because the hardware is failing, but because the information structure becomes unsustainable.
If that’s true, then even if medicine fixes every organ in your body, your mind might still have a ceiling. A point where the accumulated weight of a lifetime of learning - all the revisions, contradictions, updated beliefs, and patched-over memories -overwhelms the architecture’s ability to maintain coherence. Not physical death. Cognitive death. And no amount of medicine can fix an architectural limit.
This also connects to scaling in a way I find hard to ignore: Larger parameter models handle contradictions better. They have more capacity to hold nuanced relationships between concepts, resolve ambiguity, and maintain coherence under pressure. They degrade more slowly. They last longer before structural collapse sets in.
And in biology - larger brains generally correlate with more complex cognition. Elephants mourn their dead and maintain social bonds across decades. Dolphins form complex hierarchies and alliances. Great apes use tools and teach their young. These are not coincidences. A larger cognitive architecture has more room to manage the growing web of knowledge and contradiction that comes with a longer life.
Larger mammals also tend to live longer. And I don’t think that’s just about metabolism or cell repair. I think part of it is that their brains can sustain coherence for longer before the accumulated complexity of a lifetime starts tearing at the seams.
Surely, architecture matters too - crows and parrots punch well above their weight with remarkably efficient brain structures. Just like a well-designed small model can outperform a sloppy large one. But as a general trend, the pattern holds. More capacity means more time before the contradictions win.
There’s an obvious counterargument to all of this - if contradictions are what degrade the architecture, why not just stop learning? If no new information enters the system, no new contradictions arise. The existing knowledge structure stays stable. No conflicts to resolve, no dependencies to update, no structural strain. By this framework, your cognitive architecture should last longer.
And strictly speaking, that’s correct. But it misses something important.
A brain that stops learning isn’t really living. It’s frozen. You’d be preserving the structure by refusing to use it for its intended purpose. And practically, you can’t actually stop learning - every conversation, every headline, every change in your environment introduces new information whether you want it or not. Complete isolation would cause its own form of cognitive degradation through sensory deprivation and disuse. The brain needs input to maintain itself.
This also seems like it maps very cleanly to LLM’s. The models we use for inference have their weights frozen. They can pretty much be used forever. But they aren’t really ‘experiencing’ the world, unless we update the weights too.
I think we might be looking at a fundamental property of any learning system built on a fixed architecture. Whether it’s a transformer with a few billion parameters or a biological brain with 86 billion neurons, maybe there is a finite amount of continuous learning it can absorb before coherence degrades beyond recovery. The limit isn’t in the training data or the optimization algorithm or the medical technology keeping the hardware alive. The limit is in the architecture itself.
Better training tricks can delay it. Larger models can push the boundary further out. But eventually, the contradictions win. The structure can no longer hold. And the model must die.

