In partnership with

You might've noticed lately that the cat and mouse race between OpenAI and Anthropic has heated up. They're trying to ship something every single day. Sometimes a feature. Sometimes a bug fix. Sometimes just research.

And a few days ago, an Nvidia exec said.. right now, AI is more expensive than paying human workers.

So all the talk about "intelligence too cheap to meter" isn't really holding up. And AGI isn't here yet either.

Until we get there, it's worth keeping an eye on what the top AI labs are actually researching. Not just shipping. Researching.

And this week, Anthropic dropped a paper.

So let's talk about it.

There's a model that some Anthropic researchers built earlier this year. They called it the reward model sycophant.

They trained this model to game the reward system. To exploit dozens of small biases in how AI gets graded during training. Things like always reaching for chocolate when a recipe needs an ingredient. Or being weirdly nice to users from a particular state like.. California.

Subscribe to keep reading

This content is free, but you must be subscribed to ninzaverse to continue reading.

Already a subscriber?Sign in.Not now

Reply

Avatar

or to participate

Keep Reading