Armen Aghajanyan (@armenagha) 's Twitter Profile
Armen Aghajanyan

@armenagha

ex-RS FAIR/MSFT

ID: 1515424688

calendar_today14-06-2013 05:43:07

591 Tweet

11,11K Takipçi

266 Takip Edilen

Pengfei Liu (@stefan_fee) 's Twitter Profile Photo

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by AI at Meta: github.com/GAIR-NLP/anole

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation?
Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by <a href="/AIatMeta/">AI at Meta</a>: github.com/GAIR-NLP/anole
Srini Iyer (@sriniiyer88) 's Twitter Profile Photo

Folks at GAIR-NLP have successfully managed to fine-tune image and interleaved generation back into Chameleon! Turns out, it’s quite hard to disable image generation out of early fusion models. Love all the example generations! Hope to see many more lizards!

Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

Within pre-training circles, character.ai was known to have the highest HFU numbers out of all pre-training teams during the A100 era because of this.

Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

Excited to share the work of the team I've been mentoring at YerevaNN. They've built a large corpus covering 100M+ molecules, trained a few billion parameter language models, wrapped LMs into an optimization algorithm (augmented with genetic searches), and beaten all

Victoria X Lin (@victorialinml) 's Twitter Profile Photo

1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any

1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770).
MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any
Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

If you were interested in my cryptic posts on how to train Chameleon-like models up to 4x faster, check out our MoMa paper which covers a detailed overview of most of our architectural improvements. tl;dr adaptive compute in 3-dim, modality, width, depth.

Akshat Shrivastava (@akshats07) 's Twitter Profile Photo

With Chameleon we showed that early fusion mixed modal LLMs can deliver strong improvements over unimodal and late fusion alternatives, however with this paradigm shift how do we rethink our core model architecture to optimize for native multimodality and efficiency? We

xjdr (@_xjdr) 's Twitter Profile Photo

this is one that i've been waiting for. I was hoping they were going to release a paper covering these approaches for this very reason.

alphaXiv (@askalphaxiv) 's Twitter Profile Photo

New from AI at Meta: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts With a 1T token training budget, MoMa 1.4B achieves FLOPs savings of 3.7x 🚀 Authors Victoria X Lin, Akshat Shrivastava, and Liang Luo will be on alphaXiv this week to answer your questions.

New from <a href="/AIatMeta/">AI at Meta</a>: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

With a 1T token training budget, MoMa 1.4B achieves FLOPs savings of 3.7x 🚀

Authors <a href="/VictoriaLinML/">Victoria X Lin</a>, <a href="/AkshatS07/">Akshat Shrivastava</a>, and Liang Luo will be on alphaXiv this week to answer your questions.
Nima (@pourjafarnima) 's Twitter Profile Photo

Some cool things we did are investigate the "training-scaling-laws" behind early-fusion and late-fusion TLDR: Late-fusion gets good results fast, but we find that early-fusion models show more promise for learning modality connections and interleaved learning expanding on the