//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw. Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp Guide: unsloth.ai/docs/basics/...
We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! πŸš€ Learn how 3 optimizations help your home GPU train models faster: 1. Packed-sequence metadata caching 2. Double-buffered checkpoint reloads 3. Faster MoE routing Guide: unsloth.ai/blog/nvidia-...
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚑️ MTP enables Google Gemma 4 run ~1.4–2.2Γ— faster with no accuracy loss. Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s. GGUFs + Guide: unsloth.ai/docs/models/...
1mo
Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚑️ MTP enables Qwen3.6 to generate ~1.4–2.2Γ— faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: huggingface.co/unsloth/Qwen... Guide: unsloth.ai/docs/models/...
We’re excited to share that Unsloth has joined the PyTorch Ecosystem! Unsloth is an open-source project that makes training & running models faster, more accurate with less compute. We want AI to be accessible to everyone. Blog: unsloth.ai/blog/pytorch GitHub: github.com/unslothai/un...
We made a guide on using MCP with local LLMs. Connect Qwen3.6 and Gemma 4 for controlled access to tools, files, APIs, enabling private automated workflows. Learn to use OAuth, Exa, Context7, Hugging Face & more. Guide: unsloth.ai/docs/basics/... GitHub: github.com/unslothai/un...
Google releases Gemma 4 QAT. ✨ You can now run Gemma 4 at 3x less memory with near original performance. Quantization-Aware Training (QAT) makes it possible to run Gemma 4 26B-A4B on 16GB RAM. GGUFs: huggingface.co/collections/... QAT Guide: unsloth.ai/docs/models/...
NVIDIA releases Nemotron 3 Ultra, a new 550B model. πŸ’š Nemotron-3-Ultra-550B-A55B is NVIDIA's largest LLM yet, with 1M context, frontier coding & chat. Run 2-bit on 200GB RAM, 3-bit on 256GB, 8-bit on 600GB. GGUF: huggingface.co/unsloth/NVID... Guide: unsloth.ai/docs/models/...
Google releases Gemma 4 12B, a new model that can run locally on 8GB RAM. Gemma 4 12B Unified model supports image, audio and 256K context. Run and train the model via Unsloth Studio. GGUF: huggingface.co/unsloth/gemm... Guide: unsloth.ai/docs/models/...
Google releases DiffusionGemma.✨ The new 26B-A4B diffusion text model runs locally on 18GB RAM. It supports high-speed text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio. GGUF: huggingface.co/unsloth/diff... Guide: unsloth.ai/docs/models/...
1mo
8h
24d
1mo
10d
6d
7d
8d
1d
Unsloth AI
Unsloth AI
Unsloth AI
Unsloth AI
Unsloth AI
Unsloth AI
Unsloth AI
Unsloth AI
Unsloth AI
Unsloth AI