Srini Iyer (@sriniiyer88) Twitter Tweets • TwiDoom

Morena

@morenadevil4

9 years ago

Twitter Beğeni Hilesi

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ go.fb.me/7rb19n

thumb_up_off_alt941

chat_bubble_outline27

repeat199

shareShare

Armen Aghajanyan

@armenagha

4 months ago

I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation. arxiv.org/abs/2405.09818

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat229

shareShare

Aran Komatsuzaki

@arankomatsuzaki

4 months ago

Meta presents Chameleon: Mixed-Modal Early-Fusion Foundation Models - SotA in image captioning - On par with Mixtral 8x7B and Gemini-Pro on text-only tasks - On par with Gemini Pro and GPT-4V on a new long-form mixed-modal generation evaluation arxiv.org/abs/2405.09818

thumb_up_off_alt424

chat_bubble_outline4

repeat92

shareShare

Lucas Beyer (bl16)

@giffmana

4 months ago

Armen and team have been working in this direction for a while now, and I’ve been eagerly following from the sidelines since the CM3 paper. Very nice to see the line of work come to fruition! Also nice to see that QK-layernorm works beyond ViT-22B.

thumb_up_off_alt123

chat_bubble_outline8

repeat10

shareShare

Joelle Pineau

@jpineau1

4 months ago

New work from FAIR exploring multimodal understanding and generation, all in a single model.

thumb_up_off_alt63

chat_bubble_outline2

repeat10

shareShare

Axel Darmouni

@adarmouni

4 months ago

34B-sized model edges out the titans of multimodality 🧵📖 Read of the day, day 57: Chameleon: Mixed-Modal Early-Fusion Foundation Models, by Srini Iyer, Huang, Pasunuru et al from the FAIR team of AI at Meta arxiv.org/pdf/2405.09818 Current main vision language open source

34B-sized model edges out the titans of multimodality

🧵📖 Read of the day, day 57: Chameleon: Mixed-Modal Early-Fusion Foundation Models, by <a href="/sriniiyer88/">Srini Iyer</a>, Huang, Pasunuru et al from the FAIR team of <a href="/AIatMeta/">AI at Meta</a>

arxiv.org/pdf/2405.09818

Current main vision language open source

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare

Ram Shankar Siva Kumar

@ram_ssk

4 months ago

Chameleon from Srini Iyer and Meta engineers is utterly fascinating for image generation from text only recipe! arxiv.org/abs/2405.09818…

thumb_up_off_alt8

chat_bubble_outline2

repeat1

shareShare

Joelle Pineau

@jpineau1

3 months ago

I’m excited to share a few things we’re releasing today at Meta FAIR. These new AI model and dataset releases are part of our longstanding commitment to open science and I look forward to sharing even more work like this from the brilliant minds at FAIR! ai.meta.com/blog/meta-fair…

thumb_up_off_alt374

chat_bubble_outline10

repeat73

shareShare

Asli Celikyilmaz

@real_asli

3 months ago

🚀 Exciting news! We're open sourcing Chameleon, our early fusion multimodal foundation model from last year. It handles multimodal inputs with text generation outputs, though it was trained for both text and image generation. #OpenSource #AI #ChameleonModel

thumb_up_off_alt46

chat_bubble_outline0

repeat6

shareShare

Andrew Cohen

@andrew_e_cohen

3 months ago

🦎 Chameleon weights are out!

thumb_up_off_alt19

chat_bubble_outline1

repeat2

shareShare

Gargi Ghosh

@gargighosh

3 months ago

Extremely proud of the team who made it happen. Armen Aghajanyan Srini Iyer Luke Zettlemoyer Asli Celikyilmaz Victoria X Lin Scott Yih Lili Yu and many others!

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Mike Lewis

@ml_perception

3 months ago

Excited to see the open source release of FAIR's early fusion multimodal LLMs!

thumb_up_off_alt47

chat_bubble_outline0

repeat4

shareShare

Yann LeCun

@ylecun

3 months ago

Lots of open source models released by Meta FAIR today: - Chameleon: experiment in vision-language model with early fusion. - LLM with multi-token prediction. - Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation (JASCO). - AudioSeal: audio

thumb_up_off_alt1,1K

chat_bubble_outline47

repeat204

shareShare

AI at Meta

@aiatmeta

3 months ago

Last week we released Meta Chameleon: a new mixed-modal research model from Meta FAIR. Get the models ➡️ go.fb.me/4m87kk The 7B & 34B safety tuned models we’ve released can take any combination of text and images as input and produce text outputs using a new early

thumb_up_off_alt673

chat_bubble_outline32

repeat136

shareShare

Srini Iyer

@sriniiyer88

2 months ago

Folks at GAIR-NLP have successfully managed to fine-tune image and interleaved generation back into Chameleon! Turns out, it’s quite hard to disable image generation out of early fusion models. Love all the example generations! Hope to see many more lizards!

thumb_up_off_alt18

chat_bubble_outline1

repeat6

shareShare

Srini Iyer

@sriniiyer88

2 months ago

Our MoMa paper is live! We show how we can significantly improve pre-training of fully mixed-modal early fusion models by using Mixture of Experts that are modality aware. We explore three dimensions of adaptive compute i.e. modalities, experts, and depths! Lots of learnings!

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

AI at Meta

@aiatmeta

a month ago

📣 Today we're opening a call for applications for Llama 3.1 Impact Grants! Until Nov 22, teams can submit proposals for using Llama to address social challenges across their communities for a chance to be awarded a $500K grant. Details + application ➡️ go.fb.me/smw6xc

thumb_up_off_alt853

chat_bubble_outline14

repeat195

shareShare