//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts









Loading...
We are advertising a postdoc position to work on #generative #models, #structure #induction, and MI #estimation with Michael Gutmann as part of @genaihub.bsky.social ! elxw.fa.em3.oraclecloud.com/hcmUI/Candid... Get in touch! (#ML #AI) 👉 homepages.inf.ed.ac.uk/snaraya3/ 👉 michaelgutmann.github.io
Banyan stays competitive often even managing to outperform the baselines. This is despite the fact that it is a much much smaller model 7/🧵:
3mo
We will be advertising for a postdoc position soon, to work on #generative #models #structure #induction and #uncertainty with Michael Gutmann as part of @genaihub.bsky.social ! Keep an eye out, and get in touch! ( #ML #AI #ICML2025 ) 👉 homepages.inf.ed.ac.uk/snaraya3/ 👉 michaelgutmann.github.io
11mo
2) We change our parameterization to a diagonal mechanism inspired by SSMs, which lets us reduce parameters by 10x while massively increasing performance 💪 For our initial benchmarks we pre-train Banyan on 10M tokens of English and test STS, retrieval and classification... 4/🧵
Are you compositionally curious 🤓 Want to know how to learn embeddings using🌲? In our new #ICML2025 paper, we present Banyan: A recursive net that you can train super efficiently for any language or domain, and get embeddings competitive with much much larger LLMs 1/🧵
We can make this set up much more powerful with two changes: 1) Entangling: whenever any instance of the encoder merges the same span, we reconstruct it from every possible context it can occur in, learning the global connective structure of our pre-training corpus 3/🧵
Banyan turns out to be a pretty efficient learner! Its embeddings outperform our prior recursive net, as well as a RoBERTa medium ( a few million parameter encoder) and several word embedding baselines trained on 10x more data 5/🧵