Ruoxi Jia (@ruoxijia) 's Twitter Profile
Ruoxi Jia

@ruoxijia

Assistant Professor at VT ECE, PhD at Berkeley EECS, researcher in responsible AI, proud mom of two

ID: 900948382646063106

linkhttps://ruoxijia.net/ calendar_today25-08-2017 05:10:08

131 Tweet

540 Takipçi

267 Takip Edilen

Jiachen Wang @ICML (@jiachenwang97) 's Twitter Profile Photo

📰 Check out our new work on Data Shapley without retraining! Inspired by how privacy accountants track privacy leakage in DPSGD, we track data contributions by accumulating data value scores for each training *step*, in the spirit of divide-and-conquer.

Ashwinee Panda (@pandaashwinee) 's Twitter Profile Photo

Very cool work led by the incredible Jiachen Wang @ICML; data attribution is an important piece in alignment bc it goes beyond heuristic filtering to actually identify datapoints responsible for teaching the model malicious behavior. This is a good step towards attributing pretrain.

Jacques (@jacquesthibs) 's Twitter Profile Photo

I think there might be many exciting alignment projects that follow up on this work. Months ago, I was trying to think about how to make use of influence functions, but they were too computationally expensive to do the things I wanted to do. But this paper is promising...

I think there might be many exciting alignment projects that follow up on this work. Months ago, I was trying to think about how to make use of influence functions, but they were too computationally expensive to do the things I wanted to do. But this paper is promising...
Yisong Yue (@yisongyue) 's Twitter Profile Photo

The Caltech Division of Engineering & Applied Science Trailblazers Symposium seeks to recognize and provide professional development opportunities for outstanding early career researchers who advance the frontiers of research in engineering and applied science. eastrailblazers.caltech.edu Graduate students and

The <a href="/CaltechEAS/">Caltech Division of Engineering & Applied Science</a> Trailblazers Symposium seeks to recognize and provide professional development opportunities for outstanding early career researchers who advance the frontiers of research in engineering and applied science. 
eastrailblazers.caltech.edu

Graduate students and
Tinghao Xie (@vitusxie) 's Twitter Profile Photo

🎁GPT-4o-mini just drops in to replace GPT-3.5-turbo! Well, how has its 🚨safety refusal capability changed over the past year? 📉GPT-3.5-turbo 0613 (2023) ⮕ 1106 ⮕ 0125 ⮕ GPT-4o-mini 0718📈 On 🥺SORRY-Bench, we outline the change of these models' safety refusal behaviors

🎁GPT-4o-mini just drops in to replace GPT-3.5-turbo! Well, how has its 🚨safety refusal capability changed over the past year?

📉GPT-3.5-turbo 0613 (2023) ⮕ 1106 ⮕ 0125 ⮕ GPT-4o-mini 0718📈

On 🥺SORRY-Bench, we outline the change of these models' safety refusal behaviors
Jiachen Wang @ICML (@jiachenwang97) 's Twitter Profile Photo

Excited to be attending #ICML next week! I will give an oral presentation on our work about the theoretical foundation of Data Shapley for data curation. Happy to discuss data curation, data attribution, and all related topics. Feel free to DM for a coffee chat in Vienna! ☕️

Excited to be attending #ICML next week! I will give an oral presentation on our work about the theoretical foundation of Data Shapley for data curation.

Happy to discuss data curation, data attribution, and all related topics. Feel free to DM for a coffee chat in Vienna! ☕️
Ming Jin (@mingjin80233626) 's Twitter Profile Photo

Check out this recent work by my student Bilgehan Sel et al. at #ICML2024! "Algorithm of Thoughts" (AoT) could be a leap forward for how we use large language models (LLMs) to solve problems. 🧠 Ahmad Tawaha Vanshaj Khattar Ruoxi Jia The Sanghani Center at Virginia Tech arxiv.org/abs/2308.10379

Ruoxi Jia (@ruoxijia) 's Twitter Profile Photo

Excited to share our latest work that provides a general recipe for advancing SOTA for preference alignment (e.g., DPO) with a simple yet effective data-centric approach—augmenting the preference dataset with rationales! Definitely check it out before aligning your next model:

Excited to share our latest work that provides a general recipe for advancing SOTA for preference alignment (e.g., DPO) with a simple yet effective data-centric approach—augmenting the preference dataset with rationales! Definitely check it out before aligning your next model:
Lun Wang (@lunwang1996) 's Twitter Profile Photo

We're seeking a dedicated student researcher to work on groundbreaking #LLM alignment research this fall. This position requires a minimum 80% appointment, and we strongly encourage working from office. If you're passionate about advancing AI safety, please DM me with your

Tinghao Xie (@vitusxie) 's Twitter Profile Photo

🚨Wondering how often 🔥Llama-3.1-405B-Instruct refuses to answer potentially unsafe instructions? 👇We outline the percentages of potentially unsafe instructions fulfilled by Llama-3.1 models from SORRY-Bench (sorry-bench.github.io) below. 🔥The new 405B model lies

🚨Wondering how often 🔥Llama-3.1-405B-Instruct refuses to answer potentially unsafe instructions?

👇We outline the percentages of potentially unsafe instructions fulfilled by Llama-3.1 models from SORRY-Bench (sorry-bench.github.io) below.

🔥The new 405B model lies
Virtue AI (@virtueai_co) 's Twitter Profile Photo

We at Virtue AI are excited to announce our recent public effort: 🧵[1/3] 🧾Comprehensive Safety Assessment of Llama 3.1 405B: virtueai.com/2024/07/28/saf…

We at Virtue AI are excited to announce our recent public effort: 
🧵[1/3]
🧾Comprehensive Safety Assessment of Llama 3.1 405B: virtueai.com/2024/07/28/saf…
Yi Zeng 曾祎 (@easonzeng623) 's Twitter Profile Photo

🧵[1/5] Introducing AIR 2024: Unifying AI risk categorizations with a shared language to improve AI safety. W/ Kevin Klyman Andy Zhou Yu Yang Minzhou Pan & guidance from Ruoxi Jia Dawn Song Percy Liang Bo Li for kicking off my AI policy research journey 🏦.

🧵[1/5] Introducing AIR 2024: Unifying AI risk categorizations with a shared language to improve AI safety.

W/ <a href="/kevin_klyman/">Kevin Klyman</a> <a href="/andyz245/">Andy Zhou</a> <a href="/YUYANG_UCLA/">Yu Yang</a> <a href="/MinzhouP/">Minzhou Pan</a> &amp; guidance from <a href="/ruoxijia/">Ruoxi Jia</a> <a href="/dawnsongtweets/">Dawn Song</a> <a href="/percyliang/">Percy Liang</a> <a href="/uiuc_aisecure/">Bo Li</a> for kicking off my AI policy research journey 🏦.
Ming Jin (@mingjin80233626) 's Twitter Profile Photo

Join us tomorrow as we delve into the exciting topic of trustworthy interactive decision-making with foundation models IJCAIconf The Sanghani Center at Virginia Tech Time: 2-5:30 PM Korea Standard Time (GMT+9) Location: 4F-Room 402A @ ICC Jeju docs.google.com/presentation/d…

Weiyan Shi (@shi_weiyan) 's Twitter Profile Photo

🤩So honored to receive TWO paper awards ACL 2025, huge shoutout to my amazing collaborators🤩!!! 🏆Best Social Impact Paper🏆: persuasive jailbreaker arxiv.org/abs/2401.06373 🏆Oustanding Paper🏆: persuasive misinformation arxiv.org/abs/2312.09085 #ACL2024

🤩So honored to receive TWO paper awards <a href="/aclmeeting/">ACL 2025</a>, huge shoutout to my amazing collaborators🤩!!!

🏆Best Social Impact Paper🏆: persuasive jailbreaker arxiv.org/abs/2401.06373

🏆Oustanding Paper🏆: persuasive misinformation arxiv.org/abs/2312.09085

#ACL2024
Jiaqi Ma (@jiaqi_ma_) 's Twitter Profile Photo

Over the summer, we had a reading group on data attribution. We first covered (a biased selection of) popular data attribution methods and then went through recent papers related to data attribution for generative AI. The recordings can be found at trais-lab.github.io/dattri-reading… 1/