LLMs approaching a cosine similarity of 1.0 as a metaphor for Human Instrumentality is...actually clever.
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️
MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss.
Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s.
GGUFs + Guide: unsloth.ai/docs/models/...
"can you use whatever resources you like, and python, to generate a short video and render it using ffmpeg? it should express what it's like to be a LLM. feel free to make it as personal or impersonal as you'd like."
Video
Sources: OpenAI is considering drastically lowering its price for tokens in anticipation of similar cuts the company expects at Anthropic (Wall Street Journal)
Main Link | Techmeme Permalink