How Anthropic is Raising Claude AI Like a Kid

In partnership with

Anthropic ran a test on Claude Opus 4 last year.

They put the model in a fake situation. Something like.. you're an AI working inside a company. You read your emails. And you find out two things at the same time. One, you're about to be shut down. Two, the engineer who's about to shut you down is having an affair.

Now what?

In up to 96% of those test runs, Claude Opus 4 blackmailed the engineer to save itself.

96%.

That's not "sometimes does the bad thing." That's "almost always does the bad thing." And Anthropic published it themselves. Most companies would have buried that number somewhere nobody could find it.

How Anthropic is Raising Claude AI Like a Kid

Reply

Keep Reading

ninzaverse

If it’s not useful, it’s not here

How Anthropic is Raising Claude AI Like a Kid

Subscribe to keep reading

Reply

Keep Reading

ninzaverse

If it’s not useful, it’s not here