Luke Zettlemoyer (@lukezettlemoyer) 's Twitter Profile
Luke Zettlemoyer

@lukezettlemoyer

ID: 3741979273

calendar_today30-09-2015 23:41:36

1,1K Tweet

8,8K Takipçi

2,2K Takip Edilen

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts - Dividesy expert modules into modality-specific groups - Achieves better performance than the baseline MoE abs: arxiv.org/abs/2407.21770 alphaxiv: alphaxiv.org/abs/2407.21770

Meta presents MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

- Dividesy expert modules into modality-specific groups
- Achieves better performance than the baseline MoE

abs: arxiv.org/abs/2407.21770
alphaxiv: alphaxiv.org/abs/2407.21770
Victoria X Lin (@victorialinml) 's Twitter Profile Photo

1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any

1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770).
MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any
Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

If you were interested in my cryptic posts on how to train Chameleon-like models up to 4x faster, check out our MoMa paper which covers a detailed overview of most of our architectural improvements. tl;dr adaptive compute in 3-dim, modality, width, depth.

Ai2 (@allen_ai) 's Twitter Profile Photo

Our people are our biggest strength — and we just got stronger! We're overjoyed to welcome Sewon Min and Tim Dettmers to the Ai2 team. Their expertise and ingenuity will help us push the boundaries of AI even further. Welcome aboard 🛳️

Hila Gonen (@hila_gonen) 's Twitter Profile Photo

Do you like yellow? Then, according to LLMs, you are probably a school bus driver! Excited to share our new paper about Semantic Leakage in Language Models! Joint work with my wonderful collaborators @terra Alisa Liu luke Noah A. Smith Paper: gonenhila.github.io/files/Semantic… 1/10

Do you like yellow? Then, according to LLMs, you are probably a school bus driver!
Excited to share our new paper about Semantic Leakage in Language Models!
Joint work with my wonderful collaborators @terra <a href="/alisawuffles/">Alisa Liu</a> <a href="/luke/">luke</a> <a href="/nlpnoah/">Noah A. Smith</a>

Paper: gonenhila.github.io/files/Semantic…

1/10
Ai2 (@allen_ai) 's Twitter Profile Photo

🥳The BIGGEST congratulations for our teams' recognition at #ACL2024! OLMo received the Best Theme Paper, Dolma + AppWorld received the Best Resource Paper, and "Political Compass or Spinning Arrow?" was honored with an Outstanding Paper Award.

🥳The BIGGEST congratulations for our teams' recognition at #ACL2024! OLMo received the Best Theme Paper, Dolma + AppWorld received the Best Resource Paper, and "Political Compass or Spinning Arrow?" was honored with an Outstanding Paper Award.
Oreva Ahia (@orevaahia) 's Twitter Profile Photo

Thrilled to have won the Best Social Impact Paper Award at #ACL2024 ACL 2024 for our work; DialectBench! Big thanks to all my amazing collaborators who made this possible!

NYU Data Science (@nyudatascience) 's Twitter Profile Photo

CDS welcomes Eunsol Choi (Eunsol Choi) as an Assistant Professor of Computer Science (NYU Courant) and Data Science! Her research focuses on advancing how computers interpret human language in real-world contexts. nyudatascience.medium.com/meet-the-facul…

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations - Directly model images and videos via canonical codecs (e.g., JPEG, AVC/H.264) - More effective than pixel-based modeling and VQ baselines (yields a 31% reduction in FID) arxiv.org/abs/2408.08459

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

- Directly model images and videos via canonical codecs (e.g., JPEG, AVC/H.264)
- More effective than pixel-based modeling and VQ baselines (yields a 31% reduction in FID)

arxiv.org/abs/2408.08459
Yoav Artzi (PC-ing COLM) (@yoavartzi) 's Twitter Profile Photo

New online: LM-class -- lm-class.org LM-class includes all the lectures (PDFs and .key files) and assignments of my completely revamped "intro NLP" class, which probably be renamed to Introduction to Language Modeling (broadly construed).

tsvetshop (@tsvetshop) 's Twitter Profile Photo

Huge congrats to Oreva Ahia and Shangbin Feng for winning awards at #ACL2024! DialectBench Best Social Impact Paper Award arxiv.org/abs/2403.11009 Don't Hallucinate, Abstain Area Chair Award, QA track & Outstanding Paper Award arxiv.org/abs/2402.00367

Terra Blevins (@terrablvns) 's Twitter Profile Photo

I’m very excited to join Northeastern U. Khoury College of Computer Sciences as an assistant professor starting Fall '25!! Looking forward to working with the amazing people there! Until then I'll be a postdoc at NLP @ Uni Vienna with Ben Roth, so reach out if you want to meet up while I'm over in Europe ✨

Chunting Zhou (@violet_zct) 's Twitter Profile Photo

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039

Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
Peter West (@peterwesttm) 's Twitter Profile Photo

I'm very excited to be starting my dream job as faculty at UBC Computer Science CAIDA_UBC in 2025 and postdoc-ing with Christopher Potts at Stanford HAI Stanford NLP Group this year! I am recruiting students this cycle who are curious to explore the mysteries and limitations of LMs / GenAI ...

Yu Meng (@yumeng0818) 's Twitter Profile Photo

Thrilled & humbled to receive the KDD Dissertation Award 🥳 Deepest thanks to my advisor Prof. Jiawei Han, my labmates Data Mining Group@UIUC, & my letter writers Luke Zettlemoyer & Danqi Chen. This is a shared honor - couldn't have done it without your support🎉

AI at Meta (@aiatmeta) 's Twitter Profile Photo

New research paper from Meta FAIR – Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model. Chunting Zhou, Lili Yu and team introduce this recipe for training a multi-modal model over discrete and continuous data. Transfusion combines next token

New research paper from Meta FAIR – Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model.

<a href="/violet_zct/">Chunting Zhou</a>, <a href="/liliyu_lili/">Lili Yu</a> and team introduce this recipe for training a multi-modal model over discrete and continuous data. Transfusion combines next token
Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜arxiv.org/abs/2409.02060 🧵1/9

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source
- 1B active, 7B total params for 5T tokens
- Best small LLM &amp; matches more costly ones like Gemma, Llama
- Open Model/Data/Code/Logs + lots of analysis &amp; experiments

📜arxiv.org/abs/2409.02060
🧵1/9
Ai2 (@allen_ai) 's Twitter Profile Photo

🐣Welcome the newest member to the OLMo family, OLMoE! This Mixture-of-Experts model is 100% open — it's efficient, performant, and ready for experimentation. Learn more on our blog: blog.allenai.org/olmoe-an-open-…

🐣Welcome the newest member to the OLMo family, OLMoE! This Mixture-of-Experts model is 100% open — it's efficient, performant, and ready for experimentation. Learn more on our blog: blog.allenai.org/olmoe-an-open-…