Neel Nanda (@neelnanda5) 's Twitter Profile
Neel Nanda

@neelnanda5

Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!

ID: 1542528075128348674

linkhttp://neelnanda.io calendar_today30-06-2022 15:18:58

2,2K Tweet

17,17K Followers

91 Following

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Cool new mech interp start-up! I'm very interested in seeing work to find real-world applications of mechanistic interpretability - one of the best tests that our work isn't all BS. And Tom does great work. I'm curious to see how all this interest in mech interp startups goes!

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Nice work from my MATS scholars Patrick and Bart showing deeper structure in sparse autoencoder features. Mimicking Wes Gurnee findings re probing for time and space, there's a linear representation of the historical period of a feature!

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

The deadline to apply to my and Arthur Conmy's MATS streams is Aug 30, in 11 days! If you want to transition into mechanistic interpretability research, or accelerate your work if you're already in the field, I'd be excited to get your application. All backgrounds welcome!

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

I really enjoyed Chris Olah's write-up on defining linear features, and how it relates to eg multidimensional features vs one dimensional SAE features. I felt like it nicely crystallised a bunch of my existing intuitions into words. transformer-circuits.pub/2024/july-upda…

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

The Google DeepMind alignment team just put out a retrospective of what we've been up to since the start of 2023. I'm really proud of everything the team's achieved (and hope we can put out an even better update next year!)

Arthur Conmy (@arthurconmy) 's Twitter Profile Photo

I’m working with Neel once more mentoring researchers doing mechanistic interpretability work. I’d be grateful if you applied, or pass it on to someone you know who could benefit from this!

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Great work from my MATS scholars Bart Bussmann and Patrick Leask, with Michael Pearce! SAE features may be interpretable, and it's easy to assume this makes them atomic- true units of analysis that can't be broken down. But it's more complicated! Training SAEs on SAEs helps clarify

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Great interview with @AncaDianaDragan (my manager's manager!) on what's happening with safety and alignment at Google DeepMind! I'm glad that DeepMind is happy to feature discussion of catastrophic and existential risks from AI and what we're doing about them so prominently

Ajeya Cotra (@ajeya_cotra) 's Twitter Profile Photo

I resonate with this principle from Dean Ball's post last week and AI Snake Oil's post last month. Agree freedom+tech are both usually great. I'd be pissed about limits on tech *I* like based on speculative risks *I* don't buy. But we're in a tough epistemic bind 🧵

I resonate with this principle from Dean Ball's post last week and AI Snake Oil's post last month. Agree freedom+tech are both usually great. I'd be pissed about limits on tech *I* like based on speculative risks *I* don't buy. But we're in a tough epistemic bind 🧵
nolen (@itseieio) 's Twitter Profile Photo

At the height of One Million Checkboxes's popularity I thought I'd been hacked. A few hours later I was tearing up, extraordinarily proud of some brilliant teens. A thread about my favorite story from running OMCB....

Julian (@mealreplacer) 's Twitter Profile Photo

I find it extraordinarily frustrating when people in “my camp” (those who are concerned about AGI risk) post sensationalist nonsense or otherwise misleading content to drum up support for AI safety. I think the case for thinking that extremely rapid AI progress might not be good

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Manifold Market: Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines? Stephen Grugett asked me for alignment-relevant markets, this was my best idea. I think SAEs are promising, but how far can they go? manifold.markets/NeelNanda/will…

Max Roser (@maxcroser) 's Twitter Profile Photo

Many of us can save a child’s life, if we rely on the best data. I think this is one of the most important facts about our world, and the topic of my new Our World in Data article: ourworldindata.org/many-us-can-sa…

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

The recordings from the first ICML Workshop on Mechanistic Interpretability are now live! Go check out our great talks from David Bau Asma Ghandeharioun & Chris Olah, the panel moderated by Max Tegmark and our many spotlights and orals! slideslive.com/icml-2024/work…

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

The Frontier Safety and Governance team do great work to reduce the risks from current and future frontier models - please apply!