Neel Nanda
@neelnanda5
Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
ID: 1542528075128348674
http://neelnanda.io 30-06-2022 15:18:58
2,2K Tweet
17,17K Followers
91 Following
Nice work from my MATS scholars Patrick and Bart showing deeper structure in sparse autoencoder features. Mimicking Wes Gurnee findings re probing for time and space, there's a linear representation of the historical period of a feature!
The deadline to apply to my and Arthur Conmy's MATS streams is Aug 30, in 11 days! If you want to transition into mechanistic interpretability research, or accelerate your work if you're already in the field, I'd be excited to get your application. All backgrounds welcome!
I really enjoyed Chris Olah's write-up on defining linear features, and how it relates to eg multidimensional features vs one dimensional SAE features. I felt like it nicely crystallised a bunch of my existing intuitions into words. transformer-circuits.pub/2024/july-upda…
Great work from my MATS scholars Bart Bussmann and Patrick Leask, with Michael Pearce! SAE features may be interpretable, and it's easy to assume this makes them atomic- true units of analysis that can't be broken down. But it's more complicated! Training SAEs on SAEs helps clarify
The deadline for my and Arthur Conmy's Winter MATS stream is Friday, just under a week left!
Manifold Market: Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines? Stephen Grugett asked me for alignment-relevant markets, this was my best idea. I think SAEs are promising, but how far can they go? manifold.markets/NeelNanda/will…
Many of us can save a child’s life, if we rely on the best data. I think this is one of the most important facts about our world, and the topic of my new Our World in Data article: ourworldindata.org/many-us-can-sa…
The recordings from the first ICML Workshop on Mechanistic Interpretability are now live! Go check out our great talks from David Bau Asma Ghandeharioun & Chris Olah, the panel moderated by Max Tegmark and our many spotlights and orals! slideslive.com/icml-2024/work…