//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
a draft paper (for an invited talk at AAAI next month) with a philosophical analysis of work on mechanistic interpretability, with special attention to methods for propositional interpretability. arxiv.org/abs/2501.15740
Jan 28, 2025
Mechanistic interpretability is the program of explaining what AI systems are doing in terms of their internal mechanisms. I analyze some aspects of the program, along with setting out some concrete c...
arxiv.org
Propositional Interpretability in Artificial Intelligence