Inlay

This paper looks cool: arxiv.org/abs/2605.23901 "We propose the Shannon Scaling Law, a unified theoretical framework that models LLM training as information transmission over a noisy channel ... The Shannon Scaling Law consistently outperforms classical scaling laws ..."

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced ...