A great visual essay by @samwho.dev on how LLMs work! I've been interested in caching techniques to speed up LLM calls, especially for user-facing features. You can cache whole requests, but today I learned that LLMs can also cache at the token level! ngrok.com/blog/prompt-...
A far more detailed explanation of prompt caching than anyone asked for.