Han Guo (@hanguo97) 's Twitter Profile
Han Guo

@hanguo97

PhD Student @MIT_CSAIL | Past: @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.

ID: 769279457387540480

linkhttp://han-guo.info calendar_today26-08-2016 21:04:50

2,2K Tweet

2,2K Followers

4,4K Following

Han Guo (@hanguo97) 's Twitter Profile Photo

Introducing FLUTE, a CUDA kernel for non-uniformly quantized (via a lookup table) LLM Inference. It accelerates QLoRA's NormalFloat (NF) out of the box and more. As an application, we extended NF4 and are releasing quantized models for LLaMA-3 (8B/70B) and Gemma-2 (9B/27B).

Introducing FLUTE, a CUDA kernel for non-uniformly quantized (via a lookup table) LLM Inference. It accelerates QLoRA's NormalFloat (NF) out of the box and more.

As an application, we extended NF4 and are releasing quantized models for LLaMA-3 (8B/70B) and Gemma-2 (9B/27B).