Dan Fu (@realdanfu) Twitter Tweets • TwiDoom

Dan Fu

@realdanfu

+ Follow

Incoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also academic partner @togethercompute.

ID: 1173687463790829568

linkhttp://danfu.org calendar_today16-09-2019 19:58:03

611 Tweet

5,5K Followers

183 Following

Together AI

@togethercompute

4 months ago

Flash Attention, invented by Tri Dao, our Chief Scientist, Dan Fu, academic partner at Together AI, and co-authors was announced as a winner of the inaugural Stanford Data Science Open Source Software Prize at the CORES Symposium! Read more about it on our most recent blog post

Flash Attention, invented by <a href="/tri_dao/">Tri Dao</a>, our Chief Scientist, <a href="/realDanFu/">Dan Fu</a>, academic partner at Together AI, and co-authors was announced as a winner of the inaugural <a href="/StanfordData/">Stanford Data Science</a> Open Source Software Prize at the CORES Symposium!

Read more about it on our most recent blog post

thumb_up_off_alt70

chat_bubble_outline5

repeat12

shareShare

Cartesia

@cartesia_ai

4 months ago

Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast (🚀 135ms model latency), lifelike generative voice model and API. Read cartesia.ai/blog/sonic and try Sonic play.cartesia.ai

thumb_up_off_alt820

chat_bubble_outline43

repeat163

shareShare

Karan Goel

@krandiash

4 months ago

Incredibly excited to be releasing our first model, Cartesia Sonic today. Sonic is a voice model based on a new state space model architecture we've developed that's blazing fast, efficient and high quality. It's the first of many models we're building to bring cheap

thumb_up_off_alt213

chat_bubble_outline19

repeat19

shareShare

Tri Dao

@tri_dao

3 months ago

With Albert Gu, we’ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better & faster than Mamba-1, and still matching strong Transformer arch on language modeling. 1/

With <a href="/_albertgu/">Albert Gu</a>, we’ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better & faster than Mamba-1, and still matching strong Transformer arch on language modeling.
1/

thumb_up_off_alt739

chat_bubble_outline5

repeat125

shareShare

Dan Fu

@realdanfu

3 months ago

Mambas go brr with tensor cores!

thumb_up_off_alt23

chat_bubble_outline0

repeat1

shareShare

Dan Fu

@realdanfu

3 months ago

ES-FoMo-II deadline is today AOE! Looking forward to seeing all the great papers and posters in Vienna!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Simran Arora

@simran_s_arora

2 months ago

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and

thumb_up_off_alt289

chat_bubble_outline5

repeat61

shareShare

Tri Dao

@tri_dao

2 months ago

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/

thumb_up_off_alt2,2K

chat_bubble_outline26

repeat342

shareShare

Together AI

@togethercompute

2 months ago

Today we are announcing a new inference stack, which provides decoding throughput 4x faster than open-source vLLM. We are also introducing new Together Turbo and Together Lite endpoints that enable performance, quality, and price flexibility so you do not have to compromise.

thumb_up_off_alt338

chat_bubble_outline10

repeat59

shareShare

Together AI

@togethercompute

2 months ago

Our Llama 3.1 405B endpoint provides the best decoding performance by a huge margin, via Artificial Analysis Try it here: api.together.xyz/playground/cha…