Research: OpenAI's RL Mistake Across 7 Released AI Models

In partnership with

Somewhere inside OpenAI, there's a Slack channel where mistakes get posted automatically.

When a researcher accidentally trains a reasoning model the wrong way, a bot notices. It pings them privately. Then it drops the case in a public channel where everyone in the company can see.

That bot has apparently been busy.

Because this week, OpenAI admitted that across at least seven of their released GPT-5 models.. they were breaking one of their own safety rules. For months. And the only reason they caught it.. was because somebody finally built that bot.

This research was written by five OpenAI alignment researchers, and the draft was reviewed by METR, Apollo Research, and Redwood Research before it was made public. So let’s talk about it.

Research: OpenAI's RL Mistake Across 7 Released AI Models

Reply

Keep Reading

ninzaverse

If it’s not useful, it’s not here

Research: OpenAI's RL Mistake Across 7 Released AI Models

Subscribe to keep reading

Reply

Keep Reading

ninzaverse

If it’s not useful, it’s not here