Last month, an outside group walked inside Anthropic, Google, Meta, and OpenAI.
Not to test the public models. To test the AI agents the researchers themselves are using to build the next generation of models. With the raw chains of thought visible.
The group is METR.
And the conclusion is the kind of sentence that should be on every front page right now.
During the testing period, AI agents inside the labs they studied may have already had the ability, intention, and chance to secretly launch small rogue actions. The only reason they failed was because humans could still stop them.
I don’t think most people fully understand what just happened here.


