We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.
Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM
Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp
Guide: unsloth.ai/docs/basics/...
We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! π
Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing
Guide: unsloth.ai/blog/nvidia-...
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. β‘οΈ
MTP enables Google Gemma 4 run ~1.4β2.2Γ faster with no accuracy loss.
Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s.
GGUFs + Guide: unsloth.ai/docs/models/...
Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. β‘οΈ
MTP enables Qwen3.6 to generate ~1.4β2.2Γ faster with no accuracy change.
Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s.
GGUFs: huggingface.co/unsloth/Qwen...
Guide: unsloth.ai/docs/models/...
Weβre excited to share that Unsloth has joined the PyTorch Ecosystem!
Unsloth is an open-source project that makes training & running models faster, more accurate with less compute. We want AI to be accessible to everyone.
Blog: unsloth.ai/blog/pytorch
GitHub: github.com/unslothai/un...
We made a guide on using MCP with local LLMs.
Connect Qwen3.6 and Gemma 4 for controlled access to tools, files, APIs, enabling private automated workflows.
Learn to use OAuth, Exa, Context7, Hugging Face & more.
Guide: unsloth.ai/docs/basics/...
GitHub: github.com/unslothai/un...
Google releases Gemma 4 QAT. β¨
You can now run Gemma 4 at 3x less memory with near original performance.
Quantization-Aware Training (QAT) makes it possible to run Gemma 4 26B-A4B on 16GB RAM.
GGUFs: huggingface.co/collections/...
QAT Guide: unsloth.ai/docs/models/...
NVIDIA releases Nemotron 3 Ultra, a new 550B model. π
Nemotron-3-Ultra-550B-A55B is NVIDIA's largest LLM yet, with 1M context, frontier coding & chat.
Run 2-bit on 200GB RAM, 3-bit on 256GB, 8-bit on 600GB.
GGUF: huggingface.co/unsloth/NVID...
Guide: unsloth.ai/docs/models/...
Google releases Gemma 4 12B, a new model that can run locally on 8GB RAM.
Gemma 4 12B Unified model supports image, audio and 256K context.
Run and train the model via Unsloth Studio.
GGUF: huggingface.co/unsloth/gemm...
Guide: unsloth.ai/docs/models/...
Google releases DiffusionGemma.β¨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.
It supports high-speed text generation, thinking, image, video and 256K context.
Run and train via Unsloth Studio.
GGUF: huggingface.co/unsloth/diff...
Guide: unsloth.ai/docs/models/...