Inlay

3/5 The two models agree on their prediction for the highest likelihood label. They also disagree on the ranking by likelihood of the remaining labels, and while this has a negligible effect on the KL divergence, it means the relation between their representations is non-linear.