Amanda Bertsch(@abertsch72) 's Twitter Profileg
Amanda Bertsch

@abertsch72

PhD student @LTIatCMU / @SCSatCMU, researching text generation + summarization | she/her | also @ abertsch on bsky or https://t.co/L4HBUh0R9f or by email (https://t.co/bsHqwIMFPL)

ID:2780950260

linkhttps://www.cs.cmu.edu/~abertsch/ calendar_today30-08-2014 19:05:43

268 Tweets

1,3K Followers

726 Following

Mark Dredze(@mdredze) 's Twitter Profile Photo

Happy Halloween! Be careful out there. Look what I found inside some trick-or-treat candy.

(Credit to Arthur Spirling for the meme idea)

Happy Halloween! Be careful out there. Look what I found inside some trick-or-treat candy. (Credit to @arthur_spirling for the meme idea)
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Detecting Pretraining Data from Large Language Models

We propose Min-K% Prob, a simple and effective method that can detect whether if a LLM was pretrained on the provided text without knowing the pretraining data.

proj: swj0419.github.io/detect-pretrai…
abs: arxiv.org/abs/2310.16789

Detecting Pretraining Data from Large Language Models We propose Min-K% Prob, a simple and effective method that can detect whether if a LLM was pretrained on the provided text without knowing the pretraining data. proj: swj0419.github.io/detect-pretrai… abs: arxiv.org/abs/2310.16789
account_circle
Sang Choe(@sangkeun_choe) 's Twitter Profile Photo

High-quality data is a key to successful pretrain/finetuning in the GPT era, but manual data curation is expensive💸 We tackle data quality challenges involving large models and datasets with ScAlable Meta leArning (SAMA) 💫

Arxiv: arxiv.org/abs/2310.05674
🧵 (1/n)

High-quality data is a key to successful pretrain/finetuning in the GPT era, but manual data curation is expensive💸 We tackle data quality challenges involving large models and datasets with ScAlable Meta leArning (SAMA) #NeurIPS2023💫 Arxiv: arxiv.org/abs/2310.05674 🧵 (1/n)
account_circle
Weijia Shi(@WeijiaShi2) 's Twitter Profile Photo

Introduce In-Context Pretraining🖇️: train LMs on contexts of related documents. Improving 7B LM by simply reordering pretrain docs
📈In-context learning +8%
📈Faithful +16%
📈Reading comprehension +15%
📈Retrieval augmentation +9%
📈Long-context reason +5%
arxiv.org/abs/2301.12652

Introduce In-Context Pretraining🖇️: train LMs on contexts of related documents. Improving 7B LM by simply reordering pretrain docs 📈In-context learning +8% 📈Faithful +16% 📈Reading comprehension +15% 📈Retrieval augmentation +9% 📈Long-context reason +5% arxiv.org/abs/2301.12652
account_circle
Haikang Deng(@HaikangDeng) 's Twitter Profile Photo

Introducing RAD, a cheap and efficient method for using an auxiliary reward model for controlling text generation that can match the performance of methods that update the LM.

📝arxiv.org/abs/2310.09520
💾github.com/haikangdeng/RAD
🧵⬇️

1/

Introducing RAD, a cheap and efficient method for using an auxiliary reward model for controlling text generation that can match the performance of methods that update the LM. 📝arxiv.org/abs/2310.09520 💾github.com/haikangdeng/RAD 🧵⬇️ 1/
account_circle
Antonis Anastasopoulos needs vacation(@anas_ant) 's Twitter Profile Photo

So, I wrote about my experience as a Senior Area Chair for EMNLP, hoping to offer some insights on the processes around conference paper decisions.

The Impossible Task of Conference SACs/PCs
or
How I lost 3 Nights of Sleep
gist.github.com/antonisa/2158d…

comments/feedback welcome!

account_circle
Clara Na(@claranahhh) 's Twitter Profile Photo

It was so interesting to see participants' perceptions of 'paradigm shifts' paralleled across eras and cycles of NLP, and at the same time nothing until recent years had reached quite the level of *47%* of ACL papers in 2021 citing BERT

It was so interesting to see participants' perceptions of 'paradigm shifts' paralleled across eras and cycles of NLP, and at the same time nothing until recent years had reached quite the level of *47%* of ACL papers in 2021 citing BERT
account_circle
Sireesh Gururaja(@_sireesh) 's Twitter Profile Photo

We all know that “recently large language models have”, “large language models are”, and “large language models can.” But why LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually?

To appear at EMNLP: arxiv.org/abs/2310.07715

We all know that “recently large language models have”, “large language models are”, and “large language models can.” But why LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually? To appear at EMNLP: arxiv.org/abs/2310.07715
account_circle
Arvind Narayanan(@random_walker) 's Twitter Profile Photo

Today I want to take a break from sharing research to share a personal story instead. It’s a story about my name, why I once decided to quit academia, why I came back, what I learnt from it, and why I’m grateful to have an audience here on Twitter.

account_circle
Language Technologies Institute | @CarnegieMellon(@LTIatCMU) 's Twitter Profile Photo

Save the Date! The LTI will hold an info-session for applicants to our PhD and MLT programs on Wednesday, November 8 from 12pm -1pm, ET. Faculty and students will answer questions about the application process. More information will be posted soon!

account_circle
Prasann Singhal(@prasann_singhal) 's Twitter Profile Photo

Why does RLHF make outputs longer?

arxiv.org/pdf/2310.03716… w/ Tanya Goyal Jiacheng Xu Greg Durrett

On 3 “helpfulness” settings
- Reward models correlate strongly with length
- RLHF makes outputs longer
- *only* optimizing for length reproduces most RLHF gains

🧵 below:

Why does RLHF make outputs longer? arxiv.org/pdf/2310.03716… w/ @tanyaagoyal @JiachengNLP @gregd_nlp On 3 “helpfulness” settings - Reward models correlate strongly with length - RLHF makes outputs longer - *only* optimizing for length reproduces most RLHF gains 🧵 below:
account_circle
Ori Yoran(@OriYoran) 's Twitter Profile Photo

Retrieval-augmented LMs are not robust to irrelevant context. Retrieving entirely irrelevant context can throw off the model, even when the answer is encoded in its parameters!

In our new work, we make RALMs more robust to irrelevant context.

arxiv.org/abs/2310.01558

🧵[1/7]

Retrieval-augmented LMs are not robust to irrelevant context. Retrieving entirely irrelevant context can throw off the model, even when the answer is encoded in its parameters! In our new work, we make RALMs more robust to irrelevant context. arxiv.org/abs/2310.01558 🧵[1/7]
account_circle
Matthew Finlayson(@mattf1n) 's Twitter Profile Photo

Nucleus and top-k sampling are ubiquitous, but why do they work?
John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal and I explain the theory and give a new method to address model errors at their source (the softmax bottleneck)!
📄 arxiv.org/abs/2310.01693
🧑‍💻 github.com/mattf1n/basis-…

Nucleus and top-k sampling are ubiquitous, but why do they work? @johnhewtt, @alkoller, @swabhz, @Ashish_S_AI and I explain the theory and give a new method to address model errors at their source (the softmax bottleneck)! 📄 arxiv.org/abs/2310.01693 🧑‍💻 github.com/mattf1n/basis-…
account_circle