Thomas Wolf (@thom_wolf) 's Twitter Profile
Thomas Wolf

@thom_wolf

Co-founder and CSO @HuggingFace - open-source and open-science

ID: 246939962

linkhttps://thomwolf.io calendar_today03-02-2011 19:33:48

3,3K Tweet

73,73K Followers

4,4K Following

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Summary of stable FP8 training recipe for Llama models: 1. Clip learning rate when 2nd moment estimator is outdated by coming spike – see arxiv.org/abs/2304.13013 2. SmoothQuant: smooths activation outliers by migrating the quantization difficulty to weights – see

Matthias Gallé (@mgalle) 's Twitter Profile Photo

Packing for a weekend I found this. It is hard to believe that BigScience Large Model Training really happened. The first time I heard of the idea my take was "this is going to be fun... but not going to work" Kudos to Thomas Wolf for the vision

Packing for a weekend I found this.
It is hard to believe that <a href="/BigScienceLLM/">BigScience Large Model Training</a> really happened. The first time I heard of the idea my take was "this is  going to be fun... but not going to work"

Kudos to <a href="/Thom_Wolf/">Thomas Wolf</a> for the vision
Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

3 years ago we trained a big LLM together with +1000 people one of the part I'm most proud of is how it became the school for a generation of model training engineers as well as a strong push for sharing and open-sourcing knowledge in the field

Jim Fan (@drjimfan) 's Twitter Profile Photo

HuggingFace ships!! From hardware DIY manual to Jupyter notebooks, you get the whole deluxe tutorial experience. Everything that moves will eventually be autonomous. We need a new generation of talents to work on bridging the world of bits with the world of atoms, and open-source

Byron Hsu (@hsu_byron) 's Twitter Profile Photo

Bram Hugging Face Zach Mueller github.com/huggingface/tr… We have been running the kernels extensively already so they are stable. The integration PR is merged already thanks to Zach Mueller 's team! Use `--use-liger-kernel` in the training arguments in the next release. The official integration news will

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

you have to understand this is performed with a phone camera, cheap 3d printed open-source robotic arms and 100 training examples to start getting a hint of the open-source robotic ai revolution right around the corner

Nous Research (@nousresearch) 's Twitter Profile Photo

What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: github.com/NousResearch/D… Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of

What if you could use all the computing power in the world to train a shared, open source AI model?

Preliminary report: github.com/NousResearch/D…

Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of
elie (@eliebakouch) 's Twitter Profile Photo

Really cool research by Nous Research reducing the number of gpu communication by 1000x to 10,000x while matching the convergence rate. Very glad they use Hugging Face nanotron library for their experiment 🔥 Congrats Bowen Peng emozilla ari 🏳️‍⚧️ ฅ^•ﻌ•^ฅ Umer Adil 🤗

Bill Yuchen Lin 🤖 (@billyuchenlin) 's Twitter Profile Photo

🚨 Introducing WildVision’s datasets for research on vision-language models (VLMs) — ideal for SFT, RLHF, and Eval. One of the first large-scale VLM alignment data collections sourced from human users. - 💬 WildVision-Chat: Human-VLM conversations with images for VLM training

🚨 Introducing WildVision’s datasets for research on vision-language models (VLMs) — ideal for SFT, RLHF, and Eval. One of the first large-scale VLM alignment data collections sourced from human users.

- 💬 WildVision-Chat: Human-VLM conversations with images for VLM training
Andi Marafioti (@andi_marafioti) 's Twitter Profile Photo

We released the technical report for idefics3! 🚀 It's a tutorial on building Vision-Language Models (VLMs). Perfect for beginners and packed with advanced insights for experts. 🧠 Key Highlights: - Idefics 3: Hugging Face's latest state-of-the-art 8B VLM, featuring

We released the technical report for idefics3! 🚀 
It's a tutorial on building Vision-Language Models (VLMs). Perfect for beginners and packed with advanced insights for experts.

🧠 Key Highlights:
    - Idefics 3: Hugging Face's latest state-of-the-art 8B VLM, featuring
ChatGLM (@chatglm) 's Twitter Profile Photo

🚀 CogVideoX-5B Video generation models Release! Bigger size, better quality, lower cost (runs on just 12GB GPU) 🎉 🔗 Code: github.com/THUDM/CogVideo 🤗 Model: huggingface.co/THUDM/CogVideo… 🌐 Try it out: huggingface.co/spaces/THUDM/C…

Cartesia (@cartesia_ai) 's Twitter Profile Photo

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device.

Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe
Yikang Shen (@yikang_shen) 's Twitter Profile Photo

Thanks for posting our work! (1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.

Thanks for posting our work! 
(1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.
Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

One of the essential next step in open-source ai👇 the results of several months of exploration, benchmarking, collaborations - efficient video training/storage for robotics and video models - finally out

Mohamed (@mohamedmekkouri) 's Twitter Profile Photo

While working on the 1.58 LLM project Hugging Face, I played around with some kernels for Int2xInt8. I wrote my first kernel in Triton where instead of unpacking the weights before performing the matrix multiplication, I fused the two operations and unpacked the weights on the