Neel Nanda
@neelnanda5
Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
ID: 1542528075128348674
http://neelnanda.io 30-06-2022 15:18:58
2,2K Tweet
17,17K Followers
91 Following
I really enjoyed Chris Olah's write-up on defining linear features, and how it relates to eg multidimensional features vs one dimensional SAE features. I felt like it nicely crystallised a bunch of my existing intuitions into words. transformer-circuits.pub/2024/july-upda…