Xiuyu Li(@xiuyu_l) 's Twitter Profileg
Xiuyu Li

@xiuyu_l

PhD student @berkeley_ai. Research intern @Meta. Efficient deep learning algorithms & systems, LLM. Prev @Cornell.

ID:900349585305894912

linkhttp://xiuyuli.com calendar_today23-08-2017 13:30:43

67 Tweets

412 Followers

408 Following

Baifeng(@baifeng_shi) 's Twitter Profile Photo

We have lots of cool algorithms (fine-tuning, LoRA, prompt tuning...) that can adapt a pre-trained model to a downstream task, but what do they MISS?

Surprisingly, we find models trained with these algorithms have trouble with focusing their attention!

We have lots of cool algorithms (fine-tuning, LoRA, prompt tuning...) that can adapt a pre-trained model to a downstream task, but what do they MISS? Surprisingly, we find models trained with these algorithms have trouble with focusing their attention!
account_circle
Sheng Shen(@shengs1123) 's Twitter Profile Photo

A Winning Combination for Large Language Models

TL;DR: Did you find MoE models generalize worse than dense models on downstream tasks? Not any more at the age of instruction tuning!

Surprisingly, we see the “1 + 1 > 2” effect when it comes to MoE + Instruction Tuning. [1/4]

A Winning Combination for Large Language Models TL;DR: Did you find MoE models generalize worse than dense models on downstream tasks? Not any more at the age of instruction tuning! Surprisingly, we see the “1 + 1 > 2” effect when it comes to MoE + Instruction Tuning. [1/4]
account_circle
Xiuyu Li(@xiuyu_l) 's Twitter Profile Photo

I have been waiting for this plot for a long time -- such a great reference to holistically assess the capabilities and limitations of those 'llama family' open-source models. Most importantly, it provides insights into the potential applications a self-hosted model is capable of

account_circle
Baifeng(@baifeng_shi) 's Twitter Profile Photo

Thanks AK for sharing! Our new work, RPT, can help your robot learn better by masked pre-training on sensorimotor sequences.

account_circle
Zhuohan Li(@zhuohan123) 's Twitter Profile Photo

🌟 Thrilled to introduce vLLM with Woosuk Kwon!

🚀 vLLM is an open-source LLM inference and serving library that accelerates HuggingFace Transformers by 24x and powers lmsys.org Vicuna and Chatbot Arena.

Github: github.com/vllm-project/v…
Blog: vllm.ai

account_circle
Forrest Iandola(@fiandola) 's Twitter Profile Photo

If you'll be #CVPR2023 on Sunday morning, come check out the LOVEU workshop! We have competitions for video understanding and generative AI video editing.
sites.google.com/view/loveucvpr…

Thanks for reading, now enjoy a video of a horse on mars by the winning team:

account_circle
Xiuyu Li(@xiuyu_l) 's Twitter Profile Photo

Check out breakthroughs in LLM quantization with our new work, which presents a sensitivity-aware non-uniform quantization scheme that outperforms existing methods at 3 & 4 bits, and our 4-bit quantized Vicuna matches FP baseline performance as evaluated by GPT-4! 🔥

account_circle
Ji Lin(@jilin_14) 's Twitter Profile Photo

SmoothQuant is good for W8A8 LLM quantization, what about low-bit weight-only quantization (e.g., W4A16)? We present Activation-aware Weight Quantization (AWQ) for LLM compression and acceleration: github.com/mit-han-lab/ll… 🧵

SmoothQuant is good for W8A8 LLM quantization, what about low-bit weight-only quantization (e.g., W4A16)? We present Activation-aware Weight Quantization (AWQ) for LLM compression and acceleration: github.com/mit-han-lab/ll… 🧵
account_circle
Baifeng(@baifeng_shi) 's Twitter Profile Photo

Humans pay attention to different objects when performing different tasks. Can vision transformer (ViT) do that as well?

In our recent work, we build a ViT with task-guided attention! 1/n

Visit our website to learn more: sites.google.com/view/absvit

Humans pay attention to different objects when performing different tasks. Can vision transformer (ViT) do that as well? In our recent work, we build a ViT with task-guided attention! 1/n Visit our website to learn more: sites.google.com/view/absvit
account_circle
lmsys.org(@lmsysorg) 's Twitter Profile Photo

Introducing Vicuna, an open-source chatbot impressing GPT-4!

🚀 Vicuna reaches 90%* quality of ChatGPT/Bard while significantly outperforming other baselines, according to GPT-4's assessment.

Blog: vicuna.lmsys.org
Demo: chat.lmsys.org

account_circle
Xiuyu Li(@xiuyu_l) 's Twitter Profile Photo

Excited to share our latest research GARNET, which improves GNN robustness on large-scale graphs with millions of nodes, and can serve as a plug-and-play module for various graphs and GNN backbones.

account_circle
Xiuyu Li(@xiuyu_l) 's Twitter Profile Photo

TorchSparse is really a great project to work on — if you are interested in learning more, please come and chat with us at MLSys 2022 :)

account_circle
Jacob Steinhardt(@JacobSteinhardt) 's Twitter Profile Photo

My student Kayo Yin needs your help. Her visa has been unnecessarily delayed, which would prevent her from coming to UC Berkeley to start her studies. Despite bringing all required documents, the Department of State refused to process the visa and it could take months to re-process.

account_circle