Inlay

💡The idea behind DMS is to *train* existing LLMs to evict tokens from the KV cache, while delaying the eviction some time after the decision. This allows LLMs to preserve information while reducing latency and memory size.