Chunting Zhou (@violet_zct) 's Twitter Profile
Chunting Zhou

@violet_zct

Research Scientist at FAIR. PhD @CMU. she/her.

ID: 3284146452

linkhttps://violet-zct.github.io/ calendar_today19-07-2015 09:41:45

145 Tweet

3,3K Followers

284 Following

AK (@_akhaliq) 's Twitter Profile Photo

Meta announces Megalodon Efficient LLM Pretraining and Inference with Unlimited Context Length The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and

Meta announces Megalodon

Efficient LLM Pretraining and Inference with Unlimited Context Length

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and
Chunting Zhou (@violet_zct) 's Twitter Profile Photo

🚀 Excited to introduce Chameleon, our work in mixed-modality early-fusion foundation models from last year! 🦎 Capable of understanding and generating text and images in any sequence. Check out our paper to learn more about its SOTA performance and versatile capabilities!

Daniel Levy (@daniellevy__) 's Twitter Profile Photo

Beyond excited to be starting this company with Ilya and DG! I can't imagine working on anything else at this point in human history. If you feel the same and want to work in a small, cracked, high-trust team that will produce miracles, please reach out.

Xiang Lisa Li (@xianglisali2) 's Twitter Profile Photo

arxiv.org/abs/2407.08351 LM performance on existing benchmarks is highly correlated. How do we build novel benchmarks that reveal previously unknown trends? We propose AutoBencher: it casts benchmark creation as an optimization problem with a novelty term in the objective.

arxiv.org/abs/2407.08351
LM performance on existing benchmarks is highly correlated. How do we build novel benchmarks that reveal previously unknown trends?
We propose AutoBencher: it casts benchmark creation as an optimization problem with a novelty term in the objective.
Chunting Zhou (@violet_zct) 's Twitter Profile Photo

Great work from Horace He and the team! FlexAttention is really easy to use with highly expressive designed user interface , also with strong profiles compared to Flash!

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model abs: arxiv.org/abs/2408.11039 New paper from Meta that introduces Transfusion, a recipe for training a model that can seamlessly generate discrete and continuous modalities. The authors pretrain a

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

abs: arxiv.org/abs/2408.11039

New paper from Meta that introduces Transfusion, a recipe for training a model that can seamlessly generate discrete and continuous modalities. The authors pretrain a
AK (@_akhaliq) 's Twitter Profile Photo

Transfusion Predict the Next Token and Diffuse Images with One Multi-Modal Model discuss: huggingface.co/papers/2408.11… We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function

Transfusion

Predict the Next Token and Diffuse Images with One Multi-Modal Model

discuss: huggingface.co/papers/2408.11…

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model - Can generate images and text on a par with similar scale diffusion models and language models - Compresses each image to just 16 patches arxiv.org/abs/2408.11039

Meta presents Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

- Can generate images and text on a par with similar scale diffusion models and language models
- Compresses each image to just 16 patches

arxiv.org/abs/2408.11039
Jim Fan (@drjimfan) 's Twitter Profile Photo

The transformer-land and diffusion-land have been separate for too long. There were many attempts to unify before, but they lose simplicity and elegance. Time for a transfusion🩸to revitalize the merge!

Horace He (@chhillee) 's Twitter Profile Photo

Jokes aside, it's fun to see innovation beyond the standard causal/autoregressive next-token generation in text. Transfusion is another cool work in this vein (that already used FlexAttention :P) x.com/violet_zct/sta…