2/5
We prove that a small KL divergence between models is not enough to guarantee similar representations. Here is an example of how to construct two models with small KL divergence, but representations which are far from being linear transformations of each other.