//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...








Loading...
I've seen a number of claims that it's 10T or 6T paramètres and MoE, but without sources. Simon Willison has some good points about why it certainly feels big: simonwillison.net/2026/Jun/9/c...
Long thread on exciting work on how vocal learning (thought to be crucial also for human language) works in the brains of seals and sea lions. Massive effort to scan very many brains & species. In evolution, it may have started with volitional control over breathing! www.science.org/doi/10.1126/...
Interested in how AI models can achieve flexible, robust, human-like reasoning? Me too! I am recruiting for a PhD position in neurosymbolic AI to investigate this question. If you are interested, please take a look here: werkenbij.uva.nl/en/vacancies...
(The OpenMythos repository does not look like *plausible* speculation to me, so that doesn't count)
of known techniques: (better) synthetic data, MoE routing, posttaining, harness optimization etc. Does anyone know more?
There's a broken cuneiform tablet from the Old Babylonian period, nearly 4,000 years ago, which preserves a tiny portion of a dialogue between two friends. It feels a bit like the conversations I've been having for the past week, so I wanted to share it.
My own speculation is that, in addition to even more scaling up, Anthropic has been able to better optimize the core LLM for its use inside a highly optimized harness, due to availability of all the data Anthropic has been able to collect in its surge in popularity in the last months.
Does anyone know of some plausible speculations on *technical innovations* driving the impressive performance of Mythos/Fable? The model card only talks about evaluations (interestingly, mostly in biology). The interpretability work on Mythos Preview suggests it's essentially all based on versions
3d
27d
📢 PhD position in Developmental Language Modelling (PLZ RT) What can human language acquisition teach us about training language models? Join us as a PhD! mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language @carorowland.bsky.social @mpi-nl.bsky.social
1mo
6d
6d
3mo
3d
6d
3mo
Jelle Zuidema 🟥
Jelle Zuidema 🟥