I have my doubts about benchmarks in general, however, from personal experience I can say that GPT 5.5 is only marginally better than the frontier open weights models. 4-6 months behind sounds about right. And there are people claiming a bigger gap, but that's delusional.
... as opposed to spending all day looking busy in meetings 😀
I'm now rooting for alternatives to all closed products & services that are provided by US companies.
Kimi K2.7 is pretty good BTW, as are Deepseek v4, and Minimax M3. Open solutions beat closed ones any day, and outcomes matter more than the motivation.