Gokul Swamy (@g_k_swamy) 's Twitter Profile
Gokul Swamy

@g_k_swamy

phd candidate @CMU_Robotics. ms @berkeley_ai. summers @GoogleAI, @msftresearch, @aurora_inno, @nvidia, @spacex. no model is an island.

ID: 1077849302326697985

linkhttps://gokul.dev/ calendar_today26-12-2018 08:51:13

465 Tweet

2,2K Followers

1,1K Following

cs-sop.org (@cs_sop_org) 's Twitter Profile Photo

As PhD application deadlines approach, we are super excited to announce cs-sop.org, created by Zhaofeng Wu Alexis Ross @ZejiangS💻 cs-sop.org is a platform with statements of purpose generously shared by previous applicants to CS PhD programs 🧵(1/n)

As PhD application deadlines approach, we are super excited to announce cs-sop.org, created by <a href="/zhaofeng_wu/">Zhaofeng Wu</a> <a href="/alexisjross/">Alexis Ross</a> @ZejiangS💻

cs-sop.org is a platform with statements of purpose generously shared by previous applicants to CS PhD programs

🧵(1/n)
Aurora (@aurora_inno) 's Twitter Profile Photo

Our co-founder and Chief Scientist Drew Bagnell shares the next part of his AI blog series focused on AI alignment and our approach to ensuring the safety of the Aurora Driver. Read the blog here: bit.ly/3WBFGIT

RL Beyond Rewards Workshop (@rlbrew_2024) 's Twitter Profile Photo

It is officially less than a week before the workshop begins⌛️ The workshop schedule is posted here: rlbrew-workshop.github.io/schedule.html A complete list of accepted papers can be found here: rlbrew-workshop.github.io/papers.html

Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

I'll be at #RLC2024, helping organize the RL Beyond Rewards Workshop workshop, cheering proudly for our 2 orals at the RL Safety Workshop (rlsafetyworkshop.github.io) and perhaps posting a meme or two on RL_Conference! As usual, DM if you'd like to talk imitation, RLHF, or what's next :).

RL Beyond Rewards Workshop (@rlbrew_2024) 's Twitter Profile Photo

We had a thriving morning poster session, followed by another one at 3:30. Swing on by to the Marriott Center on the 11th Floor then to participate!

We had a thriving morning poster session, followed by another one at 3:30.  Swing on by to the Marriott Center on the 11th Floor then to participate!
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

If you enjoyed / missed Jingwu Tang's talk on multi-agent IL (arxiv.org/abs/2406.04219) or Nico's talk on efficient IRL without compounding errors (rlbrew-workshop.github.io/papers/15_effi…) at #RLC2024, stop by the RL Safety / RL Beyond Rewards Workshop workshop poster sessions this afternoon to hear more!

If you enjoyed / missed <a href="/jingwu_tang/">Jingwu Tang</a>'s talk on multi-agent IL (arxiv.org/abs/2406.04219) or Nico's talk on efficient IRL without compounding errors (rlbrew-workshop.github.io/papers/15_effi…) at #RLC2024, stop by the RL Safety / <a href="/RLBRew_2024/">RL Beyond Rewards Workshop</a> workshop poster sessions this afternoon to hear more!
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

Cool variant of SPO that learns a latent-conditioned policy for *controllable* generation. Leverages an under-appreciated benefit of preference models: they always produce outputs in [0, 1], making outputs more comparable / tradeoffs more reasonable than across reward models.

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

Sriyash Poddar Yanming Wan Given latent conditional reward, optimizing policies with this is hard, due to scale ambiguity in RLHF methods. We show that methods like self-play optimization (SPO from Gokul Swamy) can help, since rewards correspond to likelihoods instead of arbitrarily scaled utilities (3/7)

<a href="/sriyash__/">Sriyash Poddar</a> <a href="/yanming_wan/">Yanming Wan</a> Given latent conditional reward, optimizing policies with this is hard, due to scale ambiguity in RLHF methods. We show that methods like self-play optimization (SPO from <a href="/g_k_swamy/">Gokul Swamy</a>) can help, since rewards correspond to likelihoods instead of arbitrarily scaled utilities (3/7)
Sanjiban Choudhury (@sanjibac) 's Twitter Profile Photo

It was a very humbling and optimistic experience to spend a week coding with these high school students. I barely knew how to code at their age, and some of these students were coding up complex search algorithms on real robots in a matter of hours. Thank you CATALYST students!!

Murtaza Dalal (@mihdalal) 's Twitter Profile Photo

Can a single neural network policy generalize over poses, objects, obstacles, backgrounds, scene arrangements, in-hand objects, and start/goal states? Introducing Neural MP: A generalist policy for solving motion planning tasks in the real world 🤖 1/N

Kensuke Nakamura (@kensukenk) 's Twitter Profile Photo

Not all prediction errors are made equal! In our new #corl2024 paper, we use the mathematical notion of regret to automatically identify when prediction failures actually led to downstream performance degradation. Website: cmu-intentlab.github.io/not-all-errors/  (1/n)

Not all prediction errors are made equal! In our new #corl2024 paper, we use the mathematical notion of regret to automatically identify when prediction failures actually led to downstream performance degradation.

Website: cmu-intentlab.github.io/not-all-errors/  (1/n)