Nathaniel Li (@natliml) 's Twitter Profile
Nathaniel Li

@natliml

CS undergrad @ucberkeley🧸; ML evaluations & robustness @scale_ai @ai_risks

ID: 1612546667101945856

linkhttp://nli0.github.io calendar_today09-01-2023 20:27:48

25 Tweet

284 Followers

264 Following

Nathaniel Li (@natliml) 's Twitter Profile Photo

Who's better at LLM mischief — humans or AIs? Spoiler: It's us. Human red teamers achieve 70%+ attack success rates against LLM defenses that stump automated adversarial attacks. Why? We’re better at adversarial yapping.🧵

Who's better at LLM mischief — humans or AIs? Spoiler: It's us.

Human red teamers achieve 70%+ attack success rates against LLM defenses that stump automated adversarial attacks. Why? We’re better at adversarial yapping.🧵