//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts









Loading...
Worth reading this in full. I come in skeptical, but this basically is a claim that an AI system at Alibaba attempted autonomous replication without human intervention. This excerpt was found and highlighted by Alexander Long. Full paper here: arxiv.org/abs/2512.24873
We surveyed 349 technical researchers, engineers, and managers (in February–April 2026) about how they use AI tools at work. On average, participants self-report that AI use made their work 1.6–2.1x more valuable, and that this multiplier will grow over time.
metr.org/careers
Our team is stretched thin at the moment! To continue upper-bounding the autonomy of AI agents, and developing evaluations for monitoring AI systems and their propensity to subvert human control, we need more great engineering and research staff. Please apply below or DM me!
We’re correcting a mistake in our modeling that inflated recent 50%-time horizons by 10-20% (and reduced 80%-horizons). We inappropriately penalized steepness in task-length→success curve fits. This most affects the oldest and newest models, whose fits are less data-constrained.
Groundhog Day is a very METR-y holiday. Small animal emerges from a cave for only a moment, shares a forecast about timelines that's somewhat difficult to interpret, and then retreats into his cave for another year.
Cool profile of @metr.org’s work in the NYT today! Particularly like this from my colleague Ajeya: “METR is an organization that asks... what we think would be most valuable for the world to know about A.I. and its risks, and then the answers are what they are.” www.nytimes.com/2026/04/17/t....
More on this idea here: metr.org/blog/2024-11...
Since early 2025, we've been studying how AI tools impact productivity among developers. Previously, we found a 20% slowdown. That finding is now outdated. Speedups now seem likely, but changes in developer behavior make our new results unreliable. We’re working to address this.
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
1mo
4mo
3mo
3mo
4mo
4mo
2mo
3mo
3mo
1mo