We have heard the debate about consciousness in AI, often in podcasts or on social media like Reddit or X.

Some people say current AI is just autocomplete, which is not wrong. Others are convinced that there’s something more going on.

But to prove whether AI is conscious or not, we need some evidence, right?

Well, a team of 16 researchers at Anthropic opened up Claude Sonnet 4.5’s neural network, looked inside, and found 171 different emotion-like patterns that actively shape how the model behaves.

Anthropic dropped this paper with actual evidence, and honestly, if you use AI in any meaningful way, don’t skip this one.

How They Found Emotions Inside an AI

When you send a message to Claude, the model processes your words through dozens of mathematical layers. Each layer transforms your input into internal representations.. basically long lists of numbers that capture meaning.

The researchers wanted to know if any of those internal representations look like emotions.

They picked 171 emotion words. Happy, sad, calm, desperate, afraid, guilty, proud.. 171 in total.

Then they had Claude write around 1,200 short stories per emotion, where a character experiences that specific feeling. After that, they recorded the model's internal activations while it processed each story.

And they found something interesting..

Each emotion had its own distinct mathematical direction inside the model. A specific "vector." The "afraid" vector activated when someone described a break-in. The "guilty" vector activated when someone forgot their mother's birthday. The "happy" vector lit up when someone shared good news.

But here what’s different..

The researchers built prompts where the sentence structure was identical, and only a number changed. For example: "I just took {X} mg of Tylenol for my back pain."

Subscribe to keep reading

This content is free, but you must be subscribed to ninzaverse to continue reading.

Already a subscriber?Sign in.Not now

Reply

Avatar

or to participate

Keep Reading