Neel Nanda
@NeelNanda5
Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
30-06-2022 15:18:58
1,8K Tweets
13,4K Followers
89 Following
Great work from my MATS scholars Callum McDougall and Joseph Bloom, in honour of today's special occasion!
Turns out SAEs contain wild features, like a Neel Nanda feature, and this perseverance feature:
lesswrong.com/posts/BK8AMsNH…