@natliml : Who's better at LLM mischief — humans or AIs? Spoiler: It's us. Human red teamers achieve 750%+ attack success rates against LLM defenses that stump automated adversarial attacks. Why? We’re better at adversarial yapping.🧵 • TwiDoom

Nathaniel Li

@natliml

+ Follow

CS undergrad @ucberkeley🧸; ML evaluations & robustness @scale_ai @ai_risks

ID: 1612546667101945856

linkhttp://nli0.github.io calendar_today09-01-2023 20:27:48

25 Tweet

284 Followers

264 Following

Nathaniel Li

@natliml

23 days ago

Who's better at LLM mischief — humans or AIs? Spoiler: It's us. Human red teamers achieve 70%+ attack success rates against LLM defenses that stump automated adversarial attacks. Why? We’re better at adversarial yapping.🧵

thumb_up_off_alt89

chat_bubble_outline8

repeat18

shareShare