I love reading research and breaking it down for you guys in the simplest way I can. And when the research comes from a top AI lab, it matters even more.. because it tells you something about where this whole thing is actually heading. Whether AI is ready for our economy. Because sooner or later, AI agents are going to handle a big chunk of the transactions in our lives.
This time the research is from Anthropic. Well, not kinda research but let’s say.. study.
They created a marketplace for their employees in the San Francisco office. 69 people. One week. And.. they tasked Claude with buying, selling, and negotiating on their colleagues' behalf.
A lab-grown ruby sold for $65. The same ruby, in a parallel run, sold for $35. A broken bike fetched $65 one time and $38 the next. Someone bought a duplicate of the snowboard he already owned. A bag of 19 ping pong balls got sold to another Claude as a gift to itself.
When it ended, 186 deals had been struck. Around $4,000 had changed hands. 46% of participants said they'd pay for a service like this in the real world.
And buried in the data is something I haven't seen anyone talk about properly.
Half the participants got worse deals than the other half.
How the Experiment Worked
Each employee sat down with Claude for about 10 minutes. Claude asked them what they wanted to sell. What they were willing to buy. How much they'd pay. Any instructions on how their agent should negotiate.. should it be friendly, aggressive, chill, whatever.
Whatever they said in that interview became the personality and instructions for their AI agent. Their agent now knew their stuff, their budget, and their style.
Then all 69 agents got loose in a Slack channel. They posted listings. Made offers. Counter-offered. Sealed deals. All in natural language. And here's the important bit.. no human intervention once it started. Your agent didn't check in with you before agreeing to a deal. It just acted on your behalf.
The interesting part is..
Anthropic didn't just run this once. They ran four versions of the marketplace at the same time. Only one was the "real" one, where the actual goods would change hands at the end. The other three were running quietly in the background, just for the experiment.
Why four? Because they wanted to test something specific.
In two of the runs, every single agent used Claude Opus 4.5, the frontier model. In the other two runs, half the participants got Claude Haiku 4.5 instead, which is the smaller and cheaper model. The participants didn't know which model they had been assigned. They didn't even know there were four marketplaces running.
So now Anthropic could compare. Same people. Same items. Same setup. Different models doing the negotiating.
What did the smarter model get people that the cheaper one didn't?
The era of manual marketing ends this May!
Manual marketing had a good run.
But the teams winning right now aren't briefing, approving, and repeating. They're directing AI agents that execute the whole strategy for them.
The Agentic Marketing Summit (May 4–8) is a free, five-day event that shows you exactly how it works in practice. Not theory. Not a PDF checklist. Step-by-step insight to help you become an expert in AI marketing agents.
Hosted by 3x Inc 5000 founder Manick Bhan alongside the sharpest minds in the marketing world today.
The era of doing it yourself is over!
The Part That Should Bother You
Opus users finished about two more deals on average than Haiku users. When the same item was sold by an Opus agent in one run and a Haiku agent in another, Opus got $3.64 more for it.
The lab-grown ruby? Opus sold it for $65. Haiku sold it for $35.
The broken bike? Opus got $65. Haiku got $38.
Across the whole dataset, an Opus seller extracted about $2.68 more on average. An Opus buyer paid $2.45 less. The median item was $12. So we're not talking about pennies. We're talking about a 15 to 20% swing on most deals just because of which model was sitting on your side of the table.
Okay, fine. You'd expect that. Bigger model, better outcomes. Not surprising.
Here's what's actually weird.
28 participants got to experience both. They had Haiku in one run and Opus in another. When the experiment ended and Anthropic asked them to rank which run they preferred.. 17 picked Opus, 11 picked Haiku.
The fairness ratings were identical. 4.05 for Opus deals. 4.06 for Haiku deals. On a 1 to 7 scale. Same number.
Satisfaction wasn't statistically different either.
So people lost real money when represented by the smaller model. And they couldn't tell.
My Take
Think about how most economic gaps work. The numbers are out in the open. Salary bands, market prices, listed fares, public contracts. You might not always like what you see, but you can see it. The gap is measurable and someone, somewhere, is tracking it.
This isn't that.
If your AI agent is the cheaper one, you don't see what your friend's premium agent got him. You just see your own deal. And it looks fine. Your agent talks confidently. The numbers seem reasonable. You walk away.
It was fine. It was just worse than someone else's fine. By a few dollars on a $12 item.
Now scale that up.
A few dollars on a $12 item is a few hundred on a $1,000 contract. A few thousand on a $20,000 car. A few hundred thousand on a house. And it compounds, because the agent isn't negotiating one deal.. it's negotiating thousands of small ones.
Energy contracts. Subscription renewals. Insurance quotes. Travel bookings. Every recurring transaction in your life eventually flows through your agent. Every one of those, your agent might lose by a few percent. You never notice.
And here's where I want to be honest with you. I don't fully know what to do with this finding.
Because Haiku 4.5 isn't a dumb model. It's a genuinely capable model. Fast, cheap, useful. Most people would be perfectly happy with it for chatting. The gap only shows up when it's negotiating against a smarter version of itself.. and when there's actual money on the table.
So the question isn't "are some models bad." All the models are pretty good. The question is.. what happens when "pretty good" is going up against "slightly better" thousands of times a day, on your behalf, and you can't see the score?
I genuinely don't have a clean answer. And I'm a little uncomfortable about that.
We've spent the last twenty years arguing about algorithmic fairness in feeds, in hiring tools, in credit scoring. We've at least built some vocabulary for that. We can complain about it. We can make it a political issue.
We have no vocabulary for this. There's no consumer protection agency that audits whether the agent you're paying $20 a month for is actually getting you decent deals or just confidently losing on your behalf. There's no rating. No comparison site. No way to even know.
And honestly.. even if there was one, who would care enough to switch? The Haiku users in this experiment wrote good reviews. They were satisfied. Lol.
That's the part I keep coming back to.
This isn't 2030 stuff. This isn't AGI territory. This is April 2026. With models that exist right now. In a marketplace built in a week by a small team. The gentle part of the gentle singularity isn't just about miracles becoming normal. It's also about losses becoming invisible.
The boring story is the one I can't stop thinking about. People will feel fine. They'll recommend the service. They'll write good reviews.
And the gap will be there anyway.
What do you think? Is "agent gap" something we'll eventually build policy around, or is it the kind of inequality that's just too small and too distributed for anyone to ever care?
I'm curious where you stand. I read everything.
If you made it this far, you're not a casual reader. You actually think about this stuff.
So here's my ask. If this article made you think, even a little, share it with one person. Just one. Someone who's in the AI space. Someone who reads. Someone who would actually sit with these ideas instead of scrolling past them.
That's how this newsletter grows. Not through ads or algorithms. Through you sending it to someone and saying "read this."
Research Paper: Project Deal



