Srini Iyer (@sriniiyer88) 's Twitter Profile
Srini Iyer

@sriniiyer88

Research Scientist at Facebook AI Research

ID: 499487546

linkhttp://sriniiyer.github.io calendar_today22-02-2012 05:14:14

107 Tweet

1,1K Takipçi

191 Takip Edilen

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ go.fb.me/7rb19n

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models.

This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence.

Paper ➡️ go.fb.me/7rb19n
Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation. arxiv.org/abs/2405.09818

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents Chameleon: Mixed-Modal Early-Fusion Foundation Models - SotA in image captioning - On par with Mixtral 8x7B and Gemini-Pro on text-only tasks - On par with Gemini Pro and GPT-4V on a new long-form mixed-modal generation evaluation arxiv.org/abs/2405.09818

Meta presents Chameleon: Mixed-Modal Early-Fusion Foundation Models

- SotA in image captioning
- On par with Mixtral 8x7B and Gemini-Pro on text-only tasks
- On par with Gemini Pro and GPT-4V on a new long-form mixed-modal generation evaluation

arxiv.org/abs/2405.09818
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Armen and team have been working in this direction for a while now, and I’ve been eagerly following from the sidelines since the CM3 paper. Very nice to see the line of work come to fruition! Also nice to see that QK-layernorm works beyond ViT-22B.

Axel Darmouni (@adarmouni) 's Twitter Profile Photo

34B-sized model edges out the titans of multimodality 🧵📖 Read of the day, day 57: Chameleon: Mixed-Modal Early-Fusion Foundation Models, by Srini Iyer, Huang, Pasunuru et al from the FAIR team of AI at Meta arxiv.org/pdf/2405.09818 Current main vision language open source

34B-sized model edges out the titans of multimodality

🧵📖 Read of the day, day 57: Chameleon: Mixed-Modal Early-Fusion Foundation Models, by <a href="/sriniiyer88/">Srini Iyer</a>, Huang, Pasunuru et al from the FAIR team of <a href="/AIatMeta/">AI at Meta</a>

arxiv.org/pdf/2405.09818

Current main vision language open source
Joelle Pineau (@jpineau1) 's Twitter Profile Photo

I’m excited to share a few things we’re releasing today at Meta FAIR. These new AI model and dataset releases are part of our longstanding commitment to open science and I look forward to sharing even more work like this from the brilliant minds at FAIR! ai.meta.com/blog/meta-fair…

Asli Celikyilmaz (@real_asli) 's Twitter Profile Photo

🚀 Exciting news! We're open sourcing Chameleon, our early fusion multimodal foundation model from last year. It handles multimodal inputs with text generation outputs, though it was trained for both text and image generation. #OpenSource #AI #ChameleonModel

Yann LeCun (@ylecun) 's Twitter Profile Photo

Lots of open source models released by Meta FAIR today: - Chameleon: experiment in vision-language model with early fusion. - LLM with multi-token prediction. - Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation (JASCO). - AudioSeal: audio

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Last week we released Meta Chameleon: a new mixed-modal research model from Meta FAIR. Get the models ➡️ go.fb.me/4m87kk The 7B & 34B safety tuned models we’ve released can take any combination of text and images as input and produce text outputs using a new early

Srini Iyer (@sriniiyer88) 's Twitter Profile Photo

Folks at GAIR-NLP have successfully managed to fine-tune image and interleaved generation back into Chameleon! Turns out, it’s quite hard to disable image generation out of early fusion models. Love all the example generations! Hope to see many more lizards!

Srini Iyer (@sriniiyer88) 's Twitter Profile Photo

Our MoMa paper is live! We show how we can significantly improve pre-training of fully mixed-modal early fusion models by using Mixture of Experts that are modality aware. We explore three dimensions of adaptive compute i.e. modalities, experts, and depths! Lots of learnings!

AI at Meta (@aiatmeta) 's Twitter Profile Photo

📣 Today we're opening a call for applications for Llama 3.1 Impact Grants! Until Nov 22, teams can submit proposals for using Llama to address social challenges across their communities for a chance to be awarded a $500K grant. Details + application ➡️ go.fb.me/smw6xc

📣 Today we're opening a call for applications for Llama 3.1 Impact Grants!

Until Nov 22, teams can submit proposals for using Llama to address social challenges across their communities for a chance to be awarded a $500K grant.

Details + application ➡️ go.fb.me/smw6xc