profile-img
Tri Dao

@tri_dao

Incoming Asst. Prof @PrincetonCS, Chief Scientist @togethercompute, PhD in machine learning & systems @StanfordAILab.

calendar_today02-05-2012 07:13:50

428 Tweets

10,1K Followers

283 Following

Tri Dao(@tri_dao) 's Twitter Profile Photo

Announcing Flash-Decoding, to make long-context LLM inference up to 8x faster! Great collab with Daniel Haziza, Francisco Massa and Grigory Sizov.

Main idea: load the KV cache in parallel as fast as possible, then separately rescale to combine the results.
1/7

Announcing Flash-Decoding, to make long-context LLM inference up to 8x faster! Great collab with @d_haziza, @fvsmassa and Grigory Sizov. Main idea: load the KV cache in parallel as fast as possible, then separately rescale to combine the results. 1/7
account_circle