Albert Gu (@_albertgu) 's Twitter Profile
Albert Gu

@_albertgu

assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.

ID: 1076265378118959104

calendar_today21-12-2018 23:57:16

296 Tweet

12,12K Followers

90 Following

Albert Gu (@_albertgu) 's Twitter Profile Photo

it's important to realize that attention-free architectures might require different methods of interaction (prompting etc) and evaluating them in the same way that we know works for Transformers might not always make sense

Tri Dao (@tri_dao) 's Twitter Profile Photo

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS!
1/
Albert Gu (@_albertgu) 's Twitter Profile Photo

we collaborated with Mistral to scale Mamba up on code! I think that different modalities or data formats (e.g. code, byte-level modeling) that have less strong "tokenizations" benefit increasingly more from compressive models such as SSMs

Tim Dettmers (@tim_dettmers) 's Twitter Profile Photo

After 7 months on the job market, I am happy to announce: - I joined Ai2 - Professor at Carnegie Mellon University from Fall 2025 - New bitsandbytes maintainer Titus von Koeller My main focus will be to strengthen open-source for real-world problems and bring the best AI to laptops 🧵

Dan Fu (@realdanfu) 's Twitter Profile Photo

Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026! I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application! Until then

Cartesia (@cartesia_ai) 's Twitter Profile Photo

To celebrate the launch of Daily's new bots,we built StudyPal, a buddy that helps you learn the most important information from research papers 🥸 Watch how Studypal teaches us the content of our Chief Scientist Albert Gu 's latest Mamba 2 Paper: arxiv.org/abs/2405.21060

Albert Gu (@_albertgu) 's Twitter Profile Photo

distillation.... mmm 🍻 state-of-the-art Mamba models with 1% of the compute, by leveraging pretrained Transformers! key insight: project the (quadratic) attention matrices onto (structured) SSM matrix mixers before end-to-end training led by students Aviv Bick Kevin Li

distillation.... mmm 🍻

state-of-the-art Mamba models with 1% of the compute, by leveraging pretrained Transformers!

key insight: project the (quadratic) attention matrices onto (structured) SSM matrix mixers before end-to-end training

led by students <a href="/avivbick/">Aviv Bick</a> <a href="/kevinyli_/">Kevin Li</a>
Beidi Chen (@beidichen) 's Twitter Profile Photo

🤯This study explains my year-long confusion on why #GPT4 leak says OpenAI deployed speculative decoding in their serving last June by Dylan Patel SemiAnalysis because I thought SD is only useful for small batches... Surprisingly speculative decoding can bring more benefits when

AI21 Labs (@ai21labs) 's Twitter Profile Photo

We released the #Jamba 1.5 open model family: - 256K #contextwindow - Up to 2.5X faster on #longcontext in its size class - Native support for structured JSON output, function calling, digesting doc objects & generating citations twtr.to/giIEE #AI #LLM #AI21Jamba

We released the #Jamba 1.5 open model family:

- 256K #contextwindow 
- Up to 2.5X faster on #longcontext in its size class
- Native support for structured JSON output, function calling, digesting doc objects &amp; generating citations 

twtr.to/giIEE

 #AI #LLM #AI21Jamba
Cartesia (@cartesia_ai) 's Twitter Profile Photo

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device.

Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe
Tri Dao (@tri_dao) 's Twitter Profile Photo

We made distillation and spec decoding work with Mamba (and linear RNNs in general)! Up to 300 tok/sec for 7B🚀. Spec dec is nontrivial as there's no KV cache to backtrack if some tokens aren't accepted, but there's an efficient hardware-aware algo to recompute the SSM states