Inlay

Profile

We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw. Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp Guide: unsloth.ai/docs/basics/...

We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀 Learn how 3 optimizations help your home GPU train models faster: 1. Packed-sequence metadata caching 2. Double-buffered checkpoint reloads 3. Faster MoE routing Guide: unsloth.ai/blog/nvidia-...

Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️ MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss. Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s. GGUFs + Guide: unsloth.ai/docs/models/...

1mo

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: huggingface.co/unsloth/Qwen... Guide: unsloth.ai/docs/models/...

We’re excited to share that Unsloth has joined the PyTorch Ecosystem! Unsloth is an open-source project that makes training & running models faster, more accurate with less compute. We want AI to be accessible to everyone. Blog: unsloth.ai/blog/pytorch GitHub: github.com/unslothai/un...

We made a guide on using MCP with local LLMs. Connect Qwen3.6 and Gemma 4 for controlled access to tools, files, APIs, enabling private automated workflows. Learn to use OAuth, Exa, Context7, Hugging Face & more. Guide: unsloth.ai/docs/basics/... GitHub: github.com/unslothai/un...

Google releases Gemma 4 QAT. ✨ You can now run Gemma 4 at 3x less memory with near original performance. Quantization-Aware Training (QAT) makes it possible to run Gemma 4 26B-A4B on 16GB RAM. GGUFs: huggingface.co/collections/... QAT Guide: unsloth.ai/docs/models/...

NVIDIA releases Nemotron 3 Ultra, a new 550B model. 💚 Nemotron-3-Ultra-550B-A55B is NVIDIA's largest LLM yet, with 1M context, frontier coding & chat. Run 2-bit on 200GB RAM, 3-bit on 256GB, 8-bit on 600GB. GGUF: huggingface.co/unsloth/NVID... Guide: unsloth.ai/docs/models/...

Google releases Gemma 4 12B, a new model that can run locally on 8GB RAM. Gemma 4 12B Unified model supports image, audio and 256K context. Run and train the model via Unsloth Studio. GGUF: huggingface.co/unsloth/gemm... Guide: unsloth.ai/docs/models/...

Google releases DiffusionGemma.✨ The new 26B-A4B diffusion text model runs locally on 18GB RAM. It supports high-speed text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio. GGUF: huggingface.co/unsloth/diff... Guide: unsloth.ai/docs/models/...

1mo

24d

1mo

10d

Unsloth AI