6,000 executives surveyed. 80% say AI has done nothing for productivity. Meanwhile their employees are using ChatGPT to write the survey responses.
ran our benchmark suite against the major Chinese models this week. the cost-to-performance spread is surprisingly wide, and there's a consistent creative writing gap vs the frontier western models. full numbers in the article
What if AI agents could just look at your screen and use any app, no APIs or integrations needed? That's what Alibaba's Qwen 3.5 actually does. We wrote about why the timing makes this a bigger deal than the model itself.
open.substack.com/pub/progress...
Characters named Elena. Weather that mirrors emotion. Endings that land too cleanly. We scored 15 AI models on these tells. The better they write, the more detectable they get.
A $0.02 AI model scored within 11% of ones costing 37-80x more on real practitioner tasks.
We tested GPT-5.2, Gemini 3.1 Pro, Claude Opus, Grok 4.1 Fast, and Mistral Large on 28 prompts with 3 blind AI evaluators. The results don't match any leaderboard.
#AI #LLM #benchmarks
The Democratic Party is at a crossroads. Our open letter calls for bold action to rebuild trust, shift entrenched perceptions, and deliver real change for ordinary Americans.
open.substack.com/pub/progress...
@aoc.bsky.social
@bennet.senate.gov
@jaimeharrison.bsky.social
@kenmartin.bsky.social
HTTP 402 "Payment Required" sat unused in the spec for 30 years. Now Stripe, Visa, and Coinbase are fighting over it. But here's the thing: the protocol is open. The gatekeeper (Cloudflare) is not. Your agent's biggest problem won't be intelligence. It'll be authentication.
MakerPulse
open.substack.com/pub/progress...
My latest piece on Gary Stevenson's economic message. Hope you enjoy.
MakerPulse
MakerPulse
MakerPulse
MakerPulse
MakerPulse
MakerPulse
MakerPulse
MakerPulse
How do we start to address climate change? Can we start mitigating strategies without convincing the deniers that climate change is real? This is an excellent data driven piece about Des Moines, IA. www.bleedingheartland.com/2024/12/27/d...
Bojiboji1
makerpulse.ai
An NBER survey of 6,000 executives found 80%+ report zero AI productivity gains. The real story is in the gap between where AI works and where it doesn't.
AgentPulse's first benchmark tests GPT-5.2, Gemini 3.1 Pro, Claude Opus, Grok 4.1 Fast, and Mistral Large on 28 practitioner tasks. The results challenge the leaderboards.
makerpulse.ai
Fresh AgentPulse data on three Chinese models: DeepSeek V3.2 costs $0.015 per run, Kimi K2.5 takes 139 seconds, and Qwen3-Max writes better than it scores.
makerpulse.ai
HTTP 402 was reserved in 1996 and ignored for three decades. Now Stripe, Cloudflare, and Visa are fighting over who controls the agent toll booth.