@ygandelsman : Mechanistic interpretability is not only a good way to understand what is going on in a model, but it is also a tool for discovering "model bugs" and exploiting them! Our new paper shows that understanding CLIP neurons enables automatic generation of semantic adversarial images: • TwiDoom

Yossi Gandelsman

@ygandelsman

+ Follow

PhD student at Berkeley AI

ID: 1275430079754092550

linkhttp://yossi.gandelsman.com calendar_today23-06-2020 14:06:56

114 Tweet

737 Followers

428 Following

Yossi Gandelsman

@ygandelsman

3 months ago

Mechanistic interpretability is not only a good way to understand what is going on in a model, but it is also a tool for discovering "model bugs" and exploiting them! Our new paper shows that understanding CLIP neurons enables automatic generation of semantic adversarial images:

thumb_up_off_alt256

chat_bubble_outline7

repeat33

shareShare