Your computer has a CPU for thinking. RAM for remembering. A screen for showing you things. A keyboard and mouse for listening to you.
Four separate systems. Engineered independently. Bolted together. And it’s working nicely for the past 50-70 years, serving the purpose.
Until now.
Meta AI and KAUST just published a paper proposing something I genuinely haven't seen before.
They want the neural network itself to become the computer.
I am not talking about an agent that sits on top of your laptop and clicks buttons for you. The AI itself.. being the machine. Its internal state carrying the computation. Its weights acting as memory. Its generated pixels serving as the interface.
Just one set of learned weights doing everything. They're calling it Neural Computers.
I wanna talk about this today. Not because the results are incredible. They're actually kind of bad in some places, and the researchers themselves say so. But because the way they're framing this problem is unlike anything in the current AI conversation. And I think it's worth your time.
So here's where we are right now.
When people talk about AI and computers, they're talking about agents. Claude's Computer Use. OpenAI's CUA. These systems sit on top of a regular computer and drive it. They see the screen, move the cursor, type stuff.
But the actual work? That's still the traditional machine underneath.
The AI is the driver. The car is still a car.
This paper asks.. what if you got rid of the car entirely?
What if the model's latent state.. its hidden layers, its representations.. what if that was the computation, the memory, and the interface, all at once? One unified runtime. Not layers on top of layers.
That's the idea.
And honestly, it sounds insane. The researchers know it sounds insane. They're not pretending they've built a laptop replacement. They explicitly say these are early, rough prototypes.
But they did build something. And how they built it is where this gets interesting.
They used video models.
Think about it for a second. Your computer screen is just a sequence of images. You type a command, the screen updates. You click a button, a window opens. One frame leads to the next, driven by your actions.
So if you train a video model to predict what the next frame should look like based on what you just did.. you've got something that behaves like a computer.
Without actually running one.


