//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
A great visual essay by @samwho.dev on how LLMs work! I've been interested in caching techniques to speed up LLM calls, especially for user-facing features. You can cache whole requests, but today I learned that LLMs can also cache at the token level! ngrok.com/blog/prompt-...
5mo
A far more detailed explanation of prompt caching than anyone asked for.
ngrok.com
Prompt caching: 10x cheaper LLM tokens, but how? | ngrok blog
Jason Bernert