Horace He (@chhillee) Twitter Tweets • TwiDoom

Horace He

@chhillee

+ Follow

@PyTorch "My learning style is Horace twitter threads" - @typedfemale

ID: 117233133

linkhttps://www.thonking.ai/p/strangely-matrix-multiplications calendar_today24-02-2010 23:48:25

2,2K Tweet

26,26K Followers

482 Following

Jonathan Chang

@cccntu

a month ago

trying a new attention pattern with FlexAttention!

thumb_up_off_alt104

chat_bubble_outline4

repeat9

shareShare

Some folks had some confusion about whether FlexAttention worked on H100s. To clarify, FlexAttention runs on H100 with fairly good perf (up to ~40% faster than FA2). However, it is still about 75-85% of the FLOPs of FA3. Check out Driss's thread for more details.

thumb_up_off_alt52

chat_bubble_outline1

repeat1

shareShare

Horace He

@chhillee

a month ago

Fun fact, this was actually the first novel mask implemented in FlexAttention from someone not working on FlexAttention!

thumb_up_off_alt159

chat_bubble_outline5

repeat9

shareShare

Horace He

@chhillee

21 days ago

We’re planning on adding learned bias support for FlexAttention. Unfortunately, this is somewhat nontrivial to do efficiently and generically. So, for flashattention, do you usually care more about 1. good performance or 2. avoiding quadratic memory usage?

thumb_up_off_alt30

chat_bubble_outline9

repeat0

shareShare

Horace He

@chhillee

20 days ago

thumb_up_off_alt150

chat_bubble_outline3

repeat10

shareShare

typedfemale

@typedfemale

18 days ago

to truly appreciate flex attention you need to have tried to make a minor modification to flash attention

thumb_up_off_alt187

chat_bubble_outline2

repeat4

shareShare

Mike Shou

@mikeshou1

16 days ago

Show-o update🔥: 1. We have released training codes on GitHub, including both pre-training and instruction tuning! 🔥 2. Add FlexAttention’s impl for great speed up. Thanks Horace He 🚀 github.com/showlab/Show-o… 3. gradio demo up 🤗 huggingface.co/spaces/showlab… Have fun!

thumb_up_off_alt149

chat_bubble_outline2

repeat26

shareShare

LMSys Open Source

@lmsys_oss

14 days ago

We're excited to announce the release of SGLang v0.3, featuring enhanced performance and extended support for novel architectures! Highlights include: - Up to 7x higher throughput for DeepSeek Multi-Head Latent Attention (MLA) - Up to 1.5x lower latency with torch.compile on

thumb_up_off_alt88

chat_bubble_outline3

repeat21

shareShare