Ruoxi Jia (@ruoxijia) Twitter Tweets • TwiDoom

Jiachen Wang @ICML

2 months ago

📰 Check out our new work on Data Shapley without retraining! Inspired by how privacy accountants track privacy leakage in DPSGD, we track data contributions by accumulating data value scores for each training *step*, in the spirit of divide-and-conquer.

thumb_up_off_alt13

chat_bubble_outline0

repeat4

shareShare

Ashwinee Panda

@pandaashwinee

2 months ago

Very cool work led by the incredible Jiachen Wang @ICML; data attribution is an important piece in alignment bc it goes beyond heuristic filtering to actually identify datapoints responsible for teaching the model malicious behavior. This is a good step towards attributing pretrain.

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

Jacques

@jacquesthibs

2 months ago

I think there might be many exciting alignment projects that follow up on this work. Months ago, I was trying to think about how to make use of influence functions, but they were too computationally expensive to do the things I wanted to do. But this paper is promising...

thumb_up_off_alt11

chat_bubble_outline2

repeat2

shareShare

Yisong Yue

@yisongyue

2 months ago

The Caltech Division of Engineering & Applied Science Trailblazers Symposium seeks to recognize and provide professional development opportunities for outstanding early career researchers who advance the frontiers of research in engineering and applied science. eastrailblazers.caltech.edu Graduate students and

The <a href="/CaltechEAS/">Caltech Division of Engineering & Applied Science</a> Trailblazers Symposium seeks to recognize and provide professional development opportunities for outstanding early career researchers who advance the frontiers of research in engineering and applied science.
eastrailblazers.caltech.edu

Graduate students and

thumb_up_off_alt20

chat_bubble_outline0

repeat7

shareShare

Tinghao Xie

@vitusxie

2 months ago

🎁GPT-4o-mini just drops in to replace GPT-3.5-turbo! Well, how has its 🚨safety refusal capability changed over the past year? 📉GPT-3.5-turbo 0613 (2023) ⮕ 1106 ⮕ 0125 ⮕ GPT-4o-mini 0718📈 On 🥺SORRY-Bench, we outline the change of these models' safety refusal behaviors

thumb_up_off_alt65

chat_bubble_outline7

repeat23

shareShare

Jiachen Wang @ICML

@jiachenwang97

2 months ago

Excited to be attending #ICML next week! I will give an oral presentation on our work about the theoretical foundation of Data Shapley for data curation. Happy to discuss data curation, data attribution, and all related topics. Feel free to DM for a coffee chat in Vienna! ☕️

thumb_up_off_alt41

chat_bubble_outline2

repeat10

shareShare

Ming Jin

@mingjin80233626

2 months ago

Check out this recent work by my student Bilgehan Sel et al. at #ICML2024! "Algorithm of Thoughts" (AoT) could be a leap forward for how we use large language models (LLMs) to solve problems. 🧠 Ahmad Tawaha Vanshaj Khattar Ruoxi Jia The Sanghani Center at Virginia Tech arxiv.org/abs/2308.10379

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare

Ruoxi Jia

@ruoxijia

2 months ago

Excited to share our latest work that provides a general recipe for advancing SOTA for preference alignment (e.g., DPO) with a simple yet effective data-centric approach—augmenting the preference dataset with rationales! Definitely check it out before aligning your next model:

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

Lun Wang

@lunwang1996

2 months ago

We're seeking a dedicated student researcher to work on groundbreaking #LLM alignment research this fall. This position requires a minimum 80% appointment, and we strongly encourage working from office. If you're passionate about advancing AI safety, please DM me with your

thumb_up_off_alt184

chat_bubble_outline4

repeat17

shareShare

Tinghao Xie

@vitusxie

2 months ago

🚨Wondering how often 🔥Llama-3.1-405B-Instruct refuses to answer potentially unsafe instructions? 👇We outline the percentages of potentially unsafe instructions fulfilled by Llama-3.1 models from SORRY-Bench (sorry-bench.github.io) below. 🔥The new 405B model lies

thumb_up_off_alt69

chat_bubble_outline0

repeat18

shareShare

Virtue AI

@virtueai_co

2 months ago

We at Virtue AI are excited to announce our recent public effort: 🧵[1/3] 🧾Comprehensive Safety Assessment of Llama 3.1 405B: virtueai.com/2024/07/28/saf…

thumb_up_off_alt27

chat_bubble_outline3

repeat10

shareShare

Ruoxi Jia

@ruoxijia

2 months ago

So proud of you! Check out Yi Zeng 曾祎's amazing efforts in technology transfer

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Yi Zeng 曾祎

@easonzeng623

2 months ago

🧵[1/5] Introducing AIR 2024: Unifying AI risk categorizations with a shared language to improve AI safety. W/ Kevin Klyman Andy Zhou Yu Yang Minzhou Pan & guidance from Ruoxi Jia Dawn Song Percy Liang Bo Li for kicking off my AI policy research journey 🏦.

🧵[1/5] Introducing AIR 2024: Unifying AI risk categorizations with a shared language to improve AI safety.

W/ <a href="/kevin_klyman/">Kevin Klyman</a> <a href="/andyz245/">Andy Zhou</a> <a href="/YUYANG_UCLA/">Yu Yang</a> <a href="/MinzhouP/">Minzhou Pan</a> & guidance from <a href="/ruoxijia/">Ruoxi Jia</a> <a href="/dawnsongtweets/">Dawn Song</a> <a href="/percyliang/">Percy Liang</a> <a href="/uiuc_aisecure/">Bo Li</a> for kicking off my AI policy research journey 🏦.

thumb_up_off_alt37

chat_bubble_outline2

repeat18

shareShare

Ming Jin

@mingjin80233626

2 months ago

Join us tomorrow as we delve into the exciting topic of trustworthy interactive decision-making with foundation models IJCAIconf The Sanghani Center at Virginia Tech Time: 2-5:30 PM Korea Standard Time (GMT+9) Location: 4F-Room 402A @ ICC Jeju docs.google.com/presentation/d…

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Weiyan Shi

@shi_weiyan

a month ago

🤩So honored to receive TWO paper awards ACL 2025, huge shoutout to my amazing collaborators🤩!!! 🏆Best Social Impact Paper🏆: persuasive jailbreaker arxiv.org/abs/2401.06373 🏆Oustanding Paper🏆: persuasive misinformation arxiv.org/abs/2312.09085 #ACL2024

🤩So honored to receive TWO paper awards <a href="/aclmeeting/">ACL 2025</a>, huge shoutout to my amazing collaborators🤩!!!

🏆Best Social Impact Paper🏆: persuasive jailbreaker arxiv.org/abs/2401.06373

🏆Oustanding Paper🏆: persuasive misinformation arxiv.org/abs/2312.09085

#ACL2024

thumb_up_off_alt260

chat_bubble_outline28

repeat18

shareShare

Jiaqi Ma

@jiaqi_ma_

23 days ago

Over the summer, we had a reading group on data attribution. We first covered (a biased selection of) popular data attribution methods and then went through recent papers related to data attribution for generative AI. The recordings can be found at trais-lab.github.io/dattri-reading… 1/

thumb_up_off_alt36

chat_bubble_outline1

repeat5

shareShare