//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...







Loading...
The 60-Year Hunt for AI's Most Important Function I was trying to understand how SwiGLU works, but I couldn’t find an explanation that clicked for me. So I made this video to explain it from first principles. Check it out: youtu.be/JRaPNrpsQ9s
27d
**Modern Transformer - Complete Guide** Interested in learning the recent advances in transformers? After 14 videos, I've finally completed this series! 🥳🥳🥳 Check out the course here: www.youtube.com/playlist?lis...
The Most Underrated Layer Inside Every AI Model Virtually every AI model has normalization layers. BUT, what makes them so essential? 🤔 New video on learning the role of normalization in stabilizing training and alternatives like DyT and Derf. youtu.be/JHl_gwVoh-k
How is DeepSeek V4 so INSANELY cheap? 🤔 Compared to a GQA baseline, it's new *compressed attention* mechanism (CSA and HCA) slashes the KV cache memory cost by 98% 🤯 at a 1M-token context! Here’s how: youtu.be/q8holiIirgo
**Modern Transformer architecture explained** I compiled a list of videos on the Transformer architecture into a short "YouTube course". www.youtube.com/playlist?lis... Hopefully, this would be helpful for beginners in the community. Happy learning! 😎
Finally got some time to read the DeepSeek Engram paper! Idea: Replace repeated reconstruction with direct lookup of common knowledge. It’s so intuitive that it feels strange this wasn’t part of the design from the start. Video summary here: youtu.be/87Q8nf1XHKA
How do we make attention actually capture context? Exclusive Self Attention (XSA) is an interesting variant that improves attention with minimal cost in speed & memory. Check out the video here: youtu.be/2eZKT4H9_iQ
New video! How do LLMs grow outrageously large yet blazingly fast? The secret: Mixture of Experts (MoE) In this video, we cover the role of FFNs, how to scale them without slowing down, and how to maintain load balance and training stability. Full video here: youtu.be/0QQlYR1r6pQ
28d
1mo
1mo
2mo
2mo
2mo
4mo
Jia-Bin Huang
Jia-Bin Huang
Jia-Bin Huang
Jia-Bin Huang
Jia-Bin Huang
Jia-Bin Huang
Jia-Bin Huang
Jia-Bin Huang
How LLMs Get Outrageously Large Yet Blazingly Fast [MoE]
YouTube video by Jia-Bin Huang
youtu.be