Abhishek Gupta (@abhishekunique7) 's Twitter Profile
Abhishek Gupta

@abhishekunique7

Assistant Professor at University of Washington. I like robots, and reinforcement learning. Previously: post-doc at MIT, PhD at Berkeley

ID: 495550336

linkhttps://homes.cs.washington.edu/~abhgupta calendar_today18-02-2012 02:50:08

358 Tweet

6,6K Followers

680 Following

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

Sriyash Poddar Yanming Wan Given latent conditional reward, optimizing policies with this is hard, due to scale ambiguity in RLHF methods. We show that methods like self-play optimization (SPO from Gokul Swamy) can help, since rewards correspond to likelihoods instead of arbitrarily scaled utilities (3/7)

<a href="/sriyash__/">Sriyash Poddar</a> <a href="/yanming_wan/">Yanming Wan</a> Given latent conditional reward, optimizing policies with this is hard, due to scale ambiguity in RLHF methods. We show that methods like self-play optimization (SPO from <a href="/g_k_swamy/">Gokul Swamy</a>) can help, since rewards correspond to likelihoods instead of arbitrarily scaled utilities (3/7)