Armen Aghajanyan (@armenagha) Twitter Tweets • TwiDoom

Pengfei Liu

2 months ago

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by AI at Meta: github.com/GAIR-NLP/anole

thumb_up_off_alt540

chat_bubble_outline11

repeat122

shareShare

Pengfei Liu

@stefan_fee

2 months ago

More examples:

thumb_up_off_alt24

chat_bubble_outline3

repeat1

shareShare

Srini Iyer

@sriniiyer88

2 months ago

Folks at GAIR-NLP have successfully managed to fine-tune image and interleaved generation back into Chameleon! Turns out, it’s quite hard to disable image generation out of early fusion models. Love all the example generations! Hope to see many more lizards!

thumb_up_off_alt18

chat_bubble_outline1

repeat6

shareShare

Armen Aghajanyan

@armenagha

2 months ago

Within pre-training circles, character.ai was known to have the highest HFU numbers out of all pre-training teams during the A100 era because of this.

thumb_up_off_alt140

chat_bubble_outline3

repeat6

shareShare

Armen Aghajanyan

@armenagha

2 months ago

Excited to share the work of the team I've been mentoring at YerevaNN. They've built a large corpus covering 100M+ molecules, trained a few billion parameter language models, wrapped LMs into an optimization algorithm (augmented with genetic searches), and beaten all

thumb_up_off_alt97

chat_bubble_outline1

repeat10

shareShare

Victoria X Lin

@victorialinml

2 months ago

1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any

thumb_up_off_alt297

chat_bubble_outline7

repeat55

shareShare

Armen Aghajanyan

@armenagha

2 months ago

If you were interested in my cryptic posts on how to train Chameleon-like models up to 4x faster, check out our MoMa paper which covers a detailed overview of most of our architectural improvements. tl;dr adaptive compute in 3-dim, modality, width, depth.

thumb_up_off_alt216

chat_bubble_outline3

repeat24

shareShare

Akshat Shrivastava

@akshats07

2 months ago

With Chameleon we showed that early fusion mixed modal LLMs can deliver strong improvements over unimodal and late fusion alternatives, however with this paradigm shift how do we rethink our core model architecture to optimize for native multimodality and efficiency? We

thumb_up_off_alt27

chat_bubble_outline0

repeat8

shareShare

xjdr

@_xjdr

2 months ago

this is one that i've been waiting for. I was hoping they were going to release a paper covering these approaches for this very reason.

thumb_up_off_alt44

chat_bubble_outline0

repeat1

shareShare

Ben (e/treats)

@andersonbcdefg

2 months ago

xjdr holy shit armen's team cracked as fuck once again

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Victoria X Lin

@victorialinml

2 months ago

Armen Aghajanyan Excited that we have deciphered most of the cryptic posts 🙌

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

alphaXiv

@askalphaxiv

a month ago

New from AI at Meta: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts With a 1T token training budget, MoMa 1.4B achieves FLOPs savings of 3.7x 🚀 Authors Victoria X Lin, Akshat Shrivastava, and Liang Luo will be on alphaXiv this week to answer your questions.

New from <a href="/AIatMeta/">AI at Meta</a>: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

With a 1T token training budget, MoMa 1.4B achieves FLOPs savings of 3.7x 🚀

Authors <a href="/VictoriaLinML/">Victoria X Lin</a>, <a href="/AkshatS07/">Akshat Shrivastava</a>, and Liang Luo will be on alphaXiv this week to answer your questions.

thumb_up_off_alt62

chat_bubble_outline3

repeat13

shareShare

Nima

@pourjafarnima

a month ago

Some cool things we did are investigate the "training-scaling-laws" behind early-fusion and late-fusion TLDR: Late-fusion gets good results fast, but we find that early-fusion models show more promise for learning modality connections and interleaved learning expanding on the

thumb_up_off_alt25

chat_bubble_outline1

repeat1

shareShare