Albert Gu (@_albertgu) Twitter Tweets • TwiDoom

Albert Gu

2 months ago

it's important to realize that attention-free architectures might require different methods of interaction (prompting etc) and evaluating them in the same way that we know works for Transformers might not always make sense

thumb_up_off_alt81

chat_bubble_outline1

repeat5

shareShare

Tri Dao

@tri_dao

2 months ago

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/

thumb_up_off_alt2,2K

chat_bubble_outline26

repeat342

shareShare

Albert Gu

@_albertgu

2 months ago

we collaborated with Mistral to scale Mamba up on code! I think that different modalities or data formats (e.g. code, byte-level modeling) that have less strong "tokenizations" benefit increasingly more from compressive models such as SSMs

thumb_up_off_alt204

chat_bubble_outline2

repeat17

shareShare

Albert Gu

@_albertgu

2 months ago

was fun visiting the Mistral AI office!! 🇫🇷🥖

thumb_up_off_alt94

chat_bubble_outline0

repeat0

shareShare

Albert Gu

@_albertgu

2 months ago

at ICML, presenting Mamba-2 on Wednesday! lmk if anyone wants to catch up

thumb_up_off_alt271

chat_bubble_outline6

repeat12

shareShare

Tim Dettmers

@tim_dettmers

2 months ago

After 7 months on the job market, I am happy to announce: - I joined Ai2 - Professor at Carnegie Mellon University from Fall 2025 - New bitsandbytes maintainer Titus von Koeller My main focus will be to strengthen open-source for real-world problems and bring the best AI to laptops 🧵

thumb_up_off_alt2,2K

chat_bubble_outline152

repeat85

shareShare

Deepti Raghavan

@deeptir18

a month ago

I'm very excited to be starting at Brown CS this fall!

thumb_up_off_alt423

chat_bubble_outline25

repeat20

shareShare

Dan Fu

@realdanfu

a month ago

Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026! I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application! Until then

thumb_up_off_alt563

chat_bubble_outline48

repeat39

shareShare

Cartesia

@cartesia_ai

a month ago

To celebrate the launch of Daily's new bots,we built StudyPal, a buddy that helps you learn the most important information from research papers 🥸 Watch how Studypal teaches us the content of our Chief Scientist Albert Gu 's latest Mamba 2 Paper: arxiv.org/abs/2405.21060

thumb_up_off_alt37

chat_bubble_outline2

repeat10

shareShare

Albert Gu

@_albertgu

a month ago

distillation.... mmm 🍻 state-of-the-art Mamba models with 1% of the compute, by leveraging pretrained Transformers! key insight: project the (quadratic) attention matrices onto (structured) SSM matrix mixers before end-to-end training led by students Aviv Bick Kevin Li

thumb_up_off_alt313

chat_bubble_outline3

repeat32

shareShare

Beidi Chen

@beidichen

a month ago

🤯This study explains my year-long confusion on why #GPT4 leak says OpenAI deployed speculative decoding in their serving last June by Dylan Patel SemiAnalysis because I thought SD is only useful for small batches... Surprisingly speculative decoding can bring more benefits when

thumb_up_off_alt35

chat_bubble_outline0

repeat6

shareShare

AI21 Labs

@ai21labs

a month ago

We released the #Jamba 1.5 open model family: - 256K #contextwindow - Up to 2.5X faster on #longcontext in its size class - Native support for structured JSON output, function calling, digesting doc objects & generating citations twtr.to/giIEE #AI #LLM #AI21Jamba

thumb_up_off_alt379

chat_bubble_outline18

repeat95

shareShare

Cartesia

@cartesia_ai

22 days ago

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe

thumb_up_off_alt352

chat_bubble_outline11

repeat79

shareShare

Tri Dao

@tri_dao

21 days ago

We made distillation and spec decoding work with Mamba (and linear RNNs in general)! Up to 300 tok/sec for 7B🚀. Spec dec is nontrivial as there's no KV cache to backtrack if some tokens aren't accepted, but there's an efficient hardware-aware algo to recompute the SSM states

thumb_up_off_alt304

chat_bubble_outline3

repeat39

shareShare

Albert Gu

@_albertgu

13 days ago

pretty sure one of these does not belong, but thanks TIME 🤔

thumb_up_off_alt1,1K

chat_bubble_outline34

repeat25

shareShare