Thomas Wolf (@thom_wolf) Twitter Tweets • TwiDoom

Thomas Wolf

a month ago

Summary of stable FP8 training recipe for Llama models: 1. Clip learning rate when 2nd moment estimator is outdated by coming spike – see arxiv.org/abs/2304.13013 2. SmoothQuant: smooths activation outliers by migrating the quantization difficulty to weights – see

thumb_up_off_alt156

chat_bubble_outline4

repeat35

shareShare

Matthias Gallé

@mgalle

a month ago

Packing for a weekend I found this. It is hard to believe that BigScience Large Model Training really happened. The first time I heard of the idea my take was "this is going to be fun... but not going to work" Kudos to Thomas Wolf for the vision

Packing for a weekend I found this.
It is hard to believe that <a href="/BigScienceLLM/">BigScience Large Model Training</a> really happened. The first time I heard of the idea my take was "this is going to be fun... but not going to work"

Kudos to <a href="/Thom_Wolf/">Thomas Wolf</a> for the vision

thumb_up_off_alt31

chat_bubble_outline1

repeat5

shareShare

Thomas Wolf

@thom_wolf

a month ago

3 years ago we trained a big LLM together with +1000 people one of the part I'm most proud of is how it became the school for a generation of model training engineers as well as a strong push for sharing and open-sourcing knowledge in the field

thumb_up_off_alt60

chat_bubble_outline3

repeat3

shareShare

Jim Fan

@drjimfan

a month ago

HuggingFace ships!! From hardware DIY manual to Jupyter notebooks, you get the whole deluxe tutorial experience. Everything that moves will eventually be autonomous. We need a new generation of talents to work on bridging the world of bits with the world of atoms, and open-source

thumb_up_off_alt595

chat_bubble_outline6

repeat166

shareShare

Byron Hsu

@hsu_byron

a month ago

Bram Hugging Face Zach Mueller github.com/huggingface/tr… We have been running the kernels extensively already so they are stable. The integration PR is merged already thanks to Zach Mueller 's team! Use `--use-liger-kernel` in the training arguments in the next release. The official integration news will

thumb_up_off_alt23

chat_bubble_outline3

repeat3

shareShare

Thomas Wolf

@thom_wolf

25 days ago

you have to understand this is performed with a phone camera, cheap 3d printed open-source robotic arms and 100 training examples to start getting a hint of the open-source robotic ai revolution right around the corner

thumb_up_off_alt234

chat_bubble_outline4

repeat40

shareShare

Thomas Wolf

@thom_wolf

25 days ago

So great to welcome Aleph-Alpha in the OSS/open-weight AI community! Impressive open-knowledge approach

thumb_up_off_alt41

chat_bubble_outline0

repeat5

shareShare

Nous Research

@nousresearch

25 days ago

What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: github.com/NousResearch/D… Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of

thumb_up_off_alt3,3K

chat_bubble_outline224

repeat577

shareShare

elie

@eliebakouch

25 days ago

Really cool research by Nous Research reducing the number of gpu communication by 1000x to 10,000x while matching the convergence rate. Very glad they use Hugging Face nanotron library for their experiment 🔥 Congrats Bowen Peng emozilla ari 🏳️‍⚧️ ฅ^•ﻌ•^ฅ Umer Adil 🤗

thumb_up_off_alt54

chat_bubble_outline0

repeat6

shareShare

Bill Yuchen Lin 🤖

@billyuchenlin

24 days ago

🚨 Introducing WildVision’s datasets for research on vision-language models (VLMs) — ideal for SFT, RLHF, and Eval. One of the first large-scale VLM alignment data collections sourced from human users. - 💬 WildVision-Chat: Human-VLM conversations with images for VLM training

thumb_up_off_alt170

chat_bubble_outline5

repeat48

shareShare

Andi Marafioti

@andi_marafioti

24 days ago

We released the technical report for idefics3! 🚀 It's a tutorial on building Vision-Language Models (VLMs). Perfect for beginners and packed with advanced insights for experts. 🧠 Key Highlights: - Idefics 3: Hugging Face's latest state-of-the-art 8B VLM, featuring

thumb_up_off_alt134

chat_bubble_outline2

repeat38

shareShare

ChatGLM

@chatglm

24 days ago

🚀 CogVideoX-5B Video generation models Release! Bigger size, better quality, lower cost (runs on just 12GB GPU) 🎉 🔗 Code: github.com/THUDM/CogVideo 🤗 Model: huggingface.co/THUDM/CogVideo… 🌐 Try it out: huggingface.co/spaces/THUDM/C…

thumb_up_off_alt211

chat_bubble_outline6

repeat56

shareShare

Cartesia

@cartesia_ai

24 days ago

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe

thumb_up_off_alt352

chat_bubble_outline11

repeat79

shareShare

Yikang Shen

@yikang_shen

24 days ago

Thanks for posting our work! (1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.

thumb_up_off_alt127

chat_bubble_outline4

repeat26

shareShare

Thomas Wolf

@thom_wolf

23 days ago

One of the essential next step in open-source ai👇 the results of several months of exploration, benchmarking, collaborations - efficient video training/storage for robotics and video models - finally out

thumb_up_off_alt33

chat_bubble_outline2

repeat7

shareShare

Thomas Wolf

@thom_wolf

21 days ago

i'm pretty sure it's the first time I see a robotics repo topping github's AI trending section huge congratulations Remi Cadene Simon Alibert Alexander Soare Marina B Michel Aractingi Haixuan Xavier Tao Jess Moss

i'm pretty sure it's the first time I see a robotics repo topping github's AI trending section

huge congratulations <a href="/RemiCadene/">Remi Cadene</a> <a href="/alibert_s/">Simon Alibert</a> <a href="/asoare159/">Alexander Soare</a> <a href="/mar1nab/">Marina B</a> <a href="/AractingiMichel/">Michel Aractingi</a> <a href="/HaixuanT/">Haixuan Xavier Tao</a> <a href="/JessMoss429864/">Jess Moss</a>

thumb_up_off_alt164

chat_bubble_outline3

repeat24

shareShare

Mohamed

@mohamedmekkouri

19 days ago

While working on the 1.58 LLM project Hugging Face, I played around with some kernels for Int2xInt8. I wrote my first kernel in Triton where instead of unpacking the weights before performing the matrix multiplication, I fused the two operations and unpacked the weights on the

thumb_up_off_alt249

chat_bubble_outline5

repeat29

shareShare