Great to see clarification comments. o3 is impressive nonetheless.
Played around with o1 and the ‘thinking’ Gemini model. The cot output (for Gemini) can confusing and convoluted, but it got 3/5 problems right. Stopped on the remaining 2.
These models are an impressive interpretability test bed.
Julius Adebayo
Is the final output actually “causally” dependent on the long COT generated? How key are these traces to the search/planning clearly happening here? Some many questions but so little answers.
Pinging into the void.
New paper. We show that the representations of LLMs, up to 3B params(!), can be engineered to encode biophysical factors that are meaningful to experts.
We don't have to hope Adam magically finds models that learn useful features; we can optimize for models that encode for interpretable features!