Amanda Bertsch (@abertsch72) Twitter Tweets • TwiCopy

Amanda Bertsch

@abertsch72

+ Follow

PhD student @LTIatCMU / @SCSatCMU, researching text generation + summarization | she/her | also @ abertsch on bsky or https://t.co/L4HBUh0R9f or by email (https://t.co/bsHqwIMFPL)

ID:2780950260

linkhttps://www.cs.cmu.edu/~abertsch/ calendar_today30-08-2014 19:05:43

268 Tweets

1,3K Followers

726 Following

Mark Dredze

@mdredze

7 months ago

Happy Halloween! Be careful out there. Look what I found inside some trick-or-treat candy.

(Credit to Arthur Spirling for the meme idea)

thumb_up_off_alt15

chat_bubble_outline0

repeat6

shareShare

account_circle

Detecting Pretraining Data from Large Language Models

We propose Min-K% Prob, a simple and effective method that can detect whether if a LLM was pretrained on the provided text without knowing the pretraining data.

proj: swj0419.github.io/detect-pretrai…
abs: arxiv.org/abs/2310.16789

account_circle

Sang Choe

@sangkeun_choe

7 months ago

High-quality data is a key to successful pretrain/finetuning in the GPT era, but manual data curation is expensive💸 We tackle data quality challenges involving large models and datasets with ScAlable Meta leArning (SAMA) #NeurIPS2023 💫

Arxiv: arxiv.org/abs/2310.05674
🧵 (1/n)

account_circle

Weijia Shi

@WeijiaShi2

7 months ago

Introduce In-Context Pretraining🖇️: train LMs on contexts of related documents. Improving 7B LM by simply reordering pretrain docs
📈In-context learning +8%
📈Faithful +16%
📈Reading comprehension +15%
📈Retrieval augmentation +9%
📈Long-context reason +5%
arxiv.org/abs/2301.12652

account_circle

Haikang Deng

@HaikangDeng

7 months ago

Introducing RAD, a cheap and efficient method for using an auxiliary reward model for controlling text generation that can match the performance of methods that update the LM.

📝arxiv.org/abs/2310.09520
💾github.com/haikangdeng/RAD
🧵⬇️

1/

account_circle

Antonis Anastasopoulos needs vacation

@anas_ant

7 months ago

So, I wrote about my experience as a Senior Area Chair for EMNLP, hoping to offer some insights on the processes around conference paper decisions.

The Impossible Task of Conference SACs/PCs
or
How I lost 3 Nights of Sleep
gist.github.com/antonisa/2158d…

comments/feedback welcome!

account_circle

Clara Na

@claranahhh

7 months ago

It was so interesting to see participants' perceptions of 'paradigm shifts' paralleled across eras and cycles of NLP, and at the same time nothing until recent years had reached quite the level of *47%* of ACL papers in 2021 citing BERT

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

account_circle

Sireesh Gururaja

@_sireesh

7 months ago

We all know that “recently large language models have”, “large language models are”, and “large language models can.” But why LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually?

To appear at EMNLP: arxiv.org/abs/2310.07715

account_circle

Arvind Narayanan

@random_walker

4 years ago

Today I want to take a break from sharing research to share a personal story instead. It’s a story about my name, why I once decided to quit academia, why I came back, what I learnt from it, and why I’m grateful to have an audience here on Twitter.

account_circle

Language Technologies Institute | @CarnegieMellon

@LTIatCMU

7 months ago

Save the Date! The LTI will hold an info-session for applicants to our PhD and MLT programs on Wednesday, November 8 from 12pm -1pm, ET. Faculty and students will answer questions about the application process. More information will be posted soon!

account_circle

Prasann Singhal

@prasann_singhal

7 months ago

Why does RLHF make outputs longer?

arxiv.org/pdf/2310.03716… w/ Tanya Goyal Jiacheng Xu Greg Durrett

On 3 “helpfulness” settings
- Reward models correlate strongly with length
- RLHF makes outputs longer
- *only* optimizing for length reproduces most RLHF gains

🧵 below:

Why does RLHF make outputs longer? arxiv.org/pdf/2310.03716… w/ @tanyaagoyal @JiachengNLP @gregd_nlp On 3 “helpfulness” settings - Reward models correlate strongly with length - RLHF makes outputs longer - *only* optimizing for length reproduces most RLHF gains 🧵 below:

account_circle

Ori Yoran

@OriYoran

7 months ago

Retrieval-augmented LMs are not robust to irrelevant context. Retrieving entirely irrelevant context can throw off the model, even when the answer is encoded in its parameters!

In our new work, we make RALMs more robust to irrelevant context.

arxiv.org/abs/2310.01558

🧵[1/7]

account_circle

Matthew Finlayson

@mattf1n

8 months ago

Nucleus and top-k sampling are ubiquitous, but why do they work?
John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal and I explain the theory and give a new method to address model errors at their source (the softmax bottleneck)!
📄 arxiv.org/abs/2310.01693
🧑‍💻 github.com/mattf1n/basis-…

Nucleus and top-k sampling are ubiquitous, but why do they work? @johnhewtt, @alkoller, @swabhz, @Ashish_S_AI and I explain the theory and give a new method to address model errors at their source (the softmax bottleneck)! 📄 arxiv.org/abs/2310.01693 🧑‍💻 github.com/mattf1n/basis-…

account_circle

Amanda Bertsch

Mark Dredze

Aran Komatsuzaki

Sang Choe

Weijia Shi

Haikang Deng

Antonis Anastasopoulos needs vacation

Clara Na

Sireesh Gururaja

Arvind Narayanan

Language Technologies Institute | @CarnegieMellon

Prasann Singhal

Ori Yoran

Matthew Finlayson