tri_dao : Announcing Flash-Decoding, to make long-context LL • TwiCopy

Tri Dao

@tri_dao

+ Follow

Incoming Asst. Prof @PrincetonCS, Chief Scientist @togethercompute, PhD in machine learning & systems @StanfordAILab.

calendar_today02-05-2012 07:13:50

428 Tweets

10,1K Followers

283 Following

Tri Dao

@tri_dao

8 months ago

Announcing Flash-Decoding, to make long-context LLM inference up to 8x faster! Great collab with Daniel Haziza, Francisco Massa and Grigory Sizov.

Main idea: load the KV cache in parallel as fast as possible, then separately rescale to combine the results.
1/7

Announcing Flash-Decoding, to make long-context LLM inference up to 8x faster! Great collab with @d_haziza, @fvsmassa and Grigory Sizov. Main idea: load the KV cache in parallel as fast as possible, then separately rescale to combine the results. 1/7

account_circle