Neel Nanda (@neelnanda5) Twitter Tweets • TwiDoom

Neel Nanda

@neelnanda5

+ Follow

Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!

ID: 1542528075128348674

linkhttp://neelnanda.io calendar_today30-06-2022 15:18:58

2,2K Tweet

17,17K Followers

91 Following

Neel Nanda

@neelnanda5

a month ago

Cool new mech interp start-up! I'm very interested in seeing work to find real-world applications of mechanistic interpretability - one of the best tests that our work isn't all BS. And Tom does great work. I'm curious to see how all this interest in mech interp startups goes!

thumb_up_off_alt142

chat_bubble_outline0

repeat5

shareShare

Neel Nanda

@neelnanda5

a month ago

Excellent article on how to critically engage with claims about the strength of an AI system

thumb_up_off_alt20

chat_bubble_outline0

repeat0

shareShare

Neel Nanda

@neelnanda5

a month ago

Nice work from my MATS scholars Patrick and Bart showing deeper structure in sparse autoencoder features. Mimicking Wes Gurnee findings re probing for time and space, there's a linear representation of the historical period of a feature!

thumb_up_off_alt63

chat_bubble_outline1

repeat4

shareShare

Neel Nanda

@neelnanda5

a month ago

Very proud of my sister for being elected as a councillor!

thumb_up_off_alt158

chat_bubble_outline3

repeat3

shareShare

Neel Nanda

@neelnanda5

a month ago

The deadline to apply to my and Arthur Conmy's MATS streams is Aug 30, in 11 days! If you want to transition into mechanistic interpretability research, or accelerate your work if you're already in the field, I'd be excited to get your application. All backgrounds welcome!

thumb_up_off_alt70

chat_bubble_outline3

repeat11

shareShare

Neel Nanda

@neelnanda5

a month ago

I really enjoyed Chris Olah's write-up on defining linear features, and how it relates to eg multidimensional features vs one dimensional SAE features. I felt like it nicely crystallised a bunch of my existing intuitions into words. transformer-circuits.pub/2024/july-upda…

thumb_up_off_alt94

chat_bubble_outline0

repeat5

shareShare

Neel Nanda

@neelnanda5

a month ago

The Google DeepMind alignment team just put out a retrospective of what we've been up to since the start of 2023. I'm really proud of everything the team's achieved (and hope we can put out an even better update next year!)

thumb_up_off_alt177

chat_bubble_outline2

repeat5

shareShare

Arthur Conmy

@arthurconmy

a month ago

I’m working with Neel once more mentoring researchers doing mechanistic interpretability work. I’d be grateful if you applied, or pass it on to someone you know who could benefit from this!

thumb_up_off_alt59

chat_bubble_outline2

repeat4

shareShare

Neel Nanda

@neelnanda5

a month ago

Great work from my MATS scholars Bart Bussmann and Patrick Leask, with Michael Pearce! SAE features may be interpretable, and it's easy to assume this makes them atomic- true units of analysis that can't be broken down. But it's more complicated! Training SAEs on SAEs helps clarify

thumb_up_off_alt106

chat_bubble_outline1

repeat6

shareShare

Neel Nanda

@neelnanda5

a month ago

The deadline for my and Arthur Conmy's Winter MATS stream is Friday, just under a week left!

thumb_up_off_alt31

chat_bubble_outline2

repeat2

shareShare

Neel Nanda

@neelnanda5

22 days ago

Great interview with @AncaDianaDragan (my manager's manager!) on what's happening with safety and alignment at Google DeepMind! I'm glad that DeepMind is happy to feature discussion of catastrophic and existential risks from AI and what we're doing about them so prominently

thumb_up_off_alt121

chat_bubble_outline2

repeat4

shareShare

Ajeya Cotra

@ajeya_cotra

22 days ago

I resonate with this principle from Dean Ball's post last week and AI Snake Oil's post last month. Agree freedom+tech are both usually great. I'd be pissed about limits on tech *I* like based on speculative risks *I* don't buy. But we're in a tough epistemic bind 🧵

thumb_up_off_alt118

chat_bubble_outline8

repeat18

shareShare

nolen

@itseieio

21 days ago

At the height of One Million Checkboxes's popularity I thought I'd been hacked. A few hours later I was tearing up, extraordinarily proud of some brilliant teens. A thread about my favorite story from running OMCB....

thumb_up_off_alt126,126K

chat_bubble_outline352

repeat13,13K

shareShare

Julian

@mealreplacer

20 days ago

I find it extraordinarily frustrating when people in “my camp” (those who are concerned about AGI risk) post sensationalist nonsense or otherwise misleading content to drum up support for AI safety. I think the case for thinking that extremely rapid AI progress might not be good

thumb_up_off_alt299

chat_bubble_outline25

repeat17

shareShare

Neel Nanda

@neelnanda5

19 days ago

Manifold Market: Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines? Stephen Grugett asked me for alignment-relevant markets, this was my best idea. I think SAEs are promising, but how far can they go? manifold.markets/NeelNanda/will…

thumb_up_off_alt145

chat_bubble_outline7

repeat9

shareShare

Max Roser

@maxcroser

18 days ago

Many of us can save a child’s life, if we rely on the best data. I think this is one of the most important facts about our world, and the topic of my new Our World in Data article: ourworldindata.org/many-us-can-sa…

thumb_up_off_alt220

chat_bubble_outline16

repeat60

shareShare

Neel Nanda

@neelnanda5

18 days ago

The recordings from the first ICML Workshop on Mechanistic Interpretability are now live! Go check out our great talks from David Bau Asma Ghandeharioun & Chris Olah, the panel moderated by Max Tegmark and our many spotlights and orals! slideslive.com/icml-2024/work…

thumb_up_off_alt402

chat_bubble_outline2

repeat65

shareShare

Neel Nanda

@neelnanda5

16 days ago

The Frontier Safety and Governance team do great work to reduce the risks from current and future frontier models - please apply!

thumb_up_off_alt40

chat_bubble_outline1

repeat0

shareShare