Dan Fu (@realdanfu) 's Twitter Profile
Dan Fu

@realdanfu

Incoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also academic partner @togethercompute.

ID: 1173687463790829568

linkhttp://danfu.org calendar_today16-09-2019 19:58:03

611 Tweet

5,5K Followers

183 Following

Together AI (@togethercompute) 's Twitter Profile Photo

Flash Attention, invented by Tri Dao, our Chief Scientist, Dan Fu, academic partner at Together AI, and co-authors was announced as a winner of the inaugural Stanford Data Science Open Source Software Prize at the CORES Symposium! Read more about it on our most recent blog post

Flash Attention, invented by <a href="/tri_dao/">Tri Dao</a>, our Chief Scientist, <a href="/realDanFu/">Dan Fu</a>, academic partner at Together AI, and co-authors was announced as a winner of the inaugural <a href="/StanfordData/">Stanford Data Science</a> Open Source Software Prize at the CORES Symposium!

Read more about it on our most recent blog post
Cartesia (@cartesia_ai) 's Twitter Profile Photo

Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast  (🚀 135ms model latency), lifelike generative voice model and API. Read cartesia.ai/blog/sonic and try Sonic play.cartesia.ai

Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast  (🚀 135ms model latency), lifelike generative voice model and API.

Read cartesia.ai/blog/sonic and try Sonic play.cartesia.ai
Karan Goel (@krandiash) 's Twitter Profile Photo

Incredibly excited to be releasing our first model, Cartesia Sonic today. Sonic is a voice model based on a new state space model architecture we've developed that's blazing fast, efficient and high quality. It's the first of many models we're building to bring cheap

Tri Dao (@tri_dao) 's Twitter Profile Photo

With Albert Gu, we’ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better & faster than Mamba-1, and still matching strong Transformer arch on language modeling. 1/

With <a href="/_albertgu/">Albert Gu</a>, we’ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better &amp; faster than Mamba-1, and still matching strong Transformer arch on language modeling.
1/
Simran Arora (@simran_s_arora) 's Twitter Profile Photo

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!!

There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient  and
Tri Dao (@tri_dao) 's Twitter Profile Photo

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS!
1/
Together AI (@togethercompute) 's Twitter Profile Photo

Today we are announcing a new inference stack, which provides decoding throughput 4x faster than open-source vLLM. We are also introducing new Together Turbo and Together Lite endpoints that enable performance, quality, and price flexibility so you do not have to compromise.

Today we are announcing a new inference stack, which provides decoding throughput 4x faster than open-source vLLM. 

We are also introducing new Together Turbo and Together Lite endpoints that enable performance, quality, and price flexibility so you do not have to compromise.