What Is Mid-Training in AI? IBM Thinks It's the Missing Piece

In partnership with

Smart starts here.

You don't have to read everything — just the right thing. 1440's daily newsletter distills the day's biggest stories from 100+ sources into one quick, 5-minute read. It's the fastest way to stay sharp, sound informed, and actually understand what's happening in the world. Join 4.5 million readers who start their day the smart way.

Join for free today!

After you read a lot of research, AI leaders’ talks, and people’s opinions in the AI space, if you are like me, someone who doesn’t get influenced by others and prefers to form your own views, then yeah, you and I are the same.

And isn’t this exactly what we want from our AI too? That it becomes more like general intelligence.

In this article, I am not going to talk about architectures like transformers or diffusion that might lead us to AGI.

Because right now, we are moving forward with the transformer architecture, and if you look at AI’s performance in maths, coding, and logical reasoning, it has been getting better and better over the last couple of years. And the speed of this progress is fast, right?

So, how are they becoming so smart so quickly?

I came across the answer to this question in a research paper from IBM Research and the MIT-IBM Watson AI Lab. It’s called PRISM.

And I genuinely think that if you are into the AI space, this article is worth your time.

How AI is Built: The Three Stages

Many of you might already know how AI is trained. But let me simplify the pipeline so we are all on the same page.

Building a smart AI usually happens in stages.

Stage 1 is Pre-training.

This is where developers feed the model the entire internet. The model reads billions of pages, learns grammar, memorizes facts, and figures out how human language works. This costs millions of dollars and takes months. But the end result is just a prediction engine. It’s not very helpful yet.

Stage 3 is Reinforcement Learning (RL).

This is the final polish. Here model is rewarded for giving correct, helpful answers and penalized for making things up. It learns how to talk to you like an assistant.

But there is a missing step in the middle.

Stage 2 is Mid-Training.

Imagine a kid who learns to read by going through an entire encyclopedia, starting without knowing how to read, but slowly figuring it out along the way.

That is pre-training.

Now, before you send them to take a highly complex college exam, you send them to a specialized prep school for math, coding, and science.

That is mid-training.

You take the base model and feed it a much smaller, highly curated diet of high-quality data. In the PRISM study, they used about 27 billion tokens. That sounds like a lot, but in AI terms, it is a very small and focused amount of data.

Now you might be thinking.. why is mid-training a big deal?

Until recently, the mid-training stage was a bit of a mystery. Companies were doing it, but no one really understood the exact science behind it. What data should you use? When should you do it? How does it affect the final model?

The researchers behind PRISM decided to test this across seven different open-source models.. including LLaMA, Mistral, and Granite.. ranging from 3 billion to 24 billion parameters.

What they found was interesting.

Mid-training is the heavy lifter. Just by applying this small, targeted mid-training phase, the models saw massive jumps in intelligence. We are talking about gaining 15 to 40 points on complex math tests, and 5 to 12 points on coding tests.

But the most interesting part is how mid-training interacts with the final stage.. Reinforcement Learning.

The Reinforcement Learning

We often hear that Reinforcement Learning is the magic that makes models reason. But the PRISM study proves that this is entirely false.

The researchers tried to take base models (models that had not gone through mid-training) and applied Reinforcement Learning directly to them.

The result? The models completely failed at complex reasoning. On advanced math tests, their scores stayed near zero. They just couldn’t do it.

But when they applied Reinforcement Learning to the mid-trained models.. the scores exploded. The models suddenly became capable of solving incredibly difficult problems.

Why does this happen?

It comes down to how the “brain” of the AI is actually changing. The researchers looked deep into the weights of the models to see what was physically altering during these training stages.

Mid-Training Restructures the Brain

When the model goes through mid-training, it experiences a massive restructuring. The researchers found that over 90% of the model’s weights change.

The model is literally rewiring its core understanding of logic, math, and code. Interesting, right?

Reinforcement Learning does not do this.

When a model goes through Reinforcement Learning, only about 5% of its weights change.

Reinforcement Learning only works if mid-training has already done the heavy lifting.

So, what actually happens when you feed data into an AI?

Let’s say you want your AI to be really good at PhD-level science. You might think you should just reward it for getting science questions right during the final Reinforcement Learning stage.

The data says otherwise.

The researchers found that if you want the model to be good at science, you must include science data during the mid-training phase. If you do that, the final model sees a massive 17 to 28 point jump in advanced science benchmarks.

But if you wait until the Reinforcement Learning stage to introduce the science data.. it barely makes a difference. The scores change by less than 2 points.

This proves that capabilities are locked in during mid-training. Reinforcement Learning cannot teach the model new foundational skills. It can only amplify the skills that mid-training already put there.

Now you might be thinking..

So what exactly is the model learning during mid-training that makes it so much smarter?

It learns how to think.

Before mid-training, if you ask a base model a hard math problem, it tries to give an answer immediately. In the study, these base models generated around 120 to 150 tokens. They basically guessed the answer, and most of the time, they were wrong.

After mid-training, the models stopped guessing. They started generating over 2,000 tokens per answer. They learned to break the problem down, write out their steps, and think through the logic. Mid-training taught them the art of the “chain of thought.”

Then, when Reinforcement Learning was applied, the models learned how to make those long thoughts more efficient and correct. They became highly calibrated. They knew when they were right and they knew when they were wrong.

What I Think!

If mid-training is the secret engine behind AI reasoning, why does it get so little attention?

Because it is not as cool as building a massive new model from scratch. But for the future of technology, it is actually much more important.

Building a base model from scratch costs tens of millions of dollars. Only a few massive tech giants can afford to do it. But the PRISM study shows that you do not need to build a new model from scratch to get state-of-the-art reasoning.

You can take an existing open-source model, apply a highly curated mid-training pipeline using just 27 billion tokens, and unlock performance that rivals top-tier proprietary models.

This democratizes AI. It means researchers, smaller companies, and open-source developers can take good models and make them brilliant, just by understanding how to structure this middle stage of learning.

Just think about how far AI reasoning has come in the last year alone. We went from models that could barely do middle-school math to models that can solve highly complex coding and physics problems.

This isn’t happening because we are just throwing more raw computing power at the wall. It is happening because the science of training is getting incredibly refined.

We are learning exactly how to teach these systems. First, give them the broad knowledge of the world. Second, give them the deep, structural logic of reasoning. And finally, polish them so they know how to interact with us.

The secret isn’t a bigger brain. The secret is a better curriculum.

Let me know what you think. Do you feel like this targeted approach will allow open-source AI to completely close the gap with the big tech giants?

What are your thoughts on how these models are learning to reason?

The next idea is already on its way: https://ninzaverse.beehiiv.com/

What Is Mid-Training in AI? IBM Thinks It's the Missing Piece

Smart starts here.

How AI is Built: The Three Stages

The Reinforcement Learning

Mid-Training Restructures the Brain

What I Think!

Reply

Keep Reading

ninzaverse

If it’s not useful, it’s not here