Yes, I agree! And some emergent capabilities of the models are just hard to dismiss as "fancy auto-complete", e.g. this amazing paper from Jack Lindsey arxiv.org/abs/2601.01828
We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguish...