@sungdongkim4 : 🤔 Do we always need a human preference for effective LLM alignment after an SFT stage? Our answer is NO 🙅‍♂️ We present a ✨preference-free alignment approach✨, leveraging an off-the-shelf retriever with effective regularizer functions: Regularized Relevance Reward (R^3). [1/n] • TwiDoom

Sungdong Kim

@sungdongkim4

+ Follow

Research Scientist @ NAVER Cloud; MS&PhD student @ KAIST #NLP #LLM #Alignment

ID: 1390479880698007555

calendar_today07-05-2021 01:33:40

92 Tweet

419 Takipçi

183 Takip Edilen

Sungdong Kim

@sungdongkim4

7 months ago

🤔 Do we always need a human preference for effective LLM alignment after an SFT stage? Our answer is NO 🙅‍♂️ We present a ✨preference-free alignment approach✨, leveraging an off-the-shelf retriever with effective regularizer functions: Regularized Relevance Reward (R^3). [1/n]

thumb_up_off_alt155

chat_bubble_outline1

repeat48

shareShare