Wei-Lin Chiang (@infwinston) 's Twitter Profile
Wei-Lin Chiang

@infwinston

CS PhD student at UC Berkeley. Building Chatbot Arena @lmsysorg

ID: 490518336

linkhttps://infwinston.github.io/ calendar_today12-02-2012 16:49:32

457 Tweet

3,3K Followers

876 Following

lmsys.org (@lmsysorg) 's Twitter Profile Photo

People have been asking why GPT-4o mini ranks so high on Arena! We truly appreciate all the feedback. A few things to note: 1. Chatbot Arena measures human preference in different areas. We encourage everyone to not just look at the overall leaderboard, but also per-category

People have been asking why GPT-4o mini ranks so high on Arena! We truly appreciate all the feedback. A few things to note:

1. Chatbot Arena measures human preference in different areas. We encourage everyone to not just look at the overall leaderboard, but also per-category
lmsys.org (@lmsysorg) 's Twitter Profile Photo

Chatbot Arena is at ICML! DM us if you’d like to chat about ✅ evals ✅ contributing to arena ✅ supporting arena ✅ …anything else!! See you at the conference center 😃

Demis Hassabis (@demishassabis) 's Twitter Profile Photo

Never seen a competitive leaderboard that I didn't like 😀 Congrats to the Gemini team on ranking no.1 🏆 with our latest improved Gemini 1.5 Pro developer preview model, which you can try on AI studio now!

Jeff Dean (@🏡) (@jeffdean) 's Twitter Profile Photo

We have an experimental updated version of Gemini 1.5 Pro that is #1 on the lmsys.org Chatbot Arena. This model is a significant improvement over earlier versions of Gemini 1.5 Pro (it cracks into 1300+ elo score territory).

Byron Hsu (@hsu_byron) 's Twitter Profile Photo

(1/n) Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training. github.com/linkedin/Liger…

(1/n)

Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training.

github.com/linkedin/Liger…
Wei-Lin Chiang (@infwinston) 's Twitter Profile Photo

Style over substance in Chatbot Arena? Check out our latest work with Tianle (Tim) Li @ml_angelopouloson to decouple them with cool statistical techniques for controlling style variables!

LMSys Open Source (@lmsys_oss) 's Twitter Profile Photo

Hello, world! We're the LMSYS team. In addition to lmsys.org, this is a new account dedicated to sharing our regular open-source research and engineering updates. Stay tuned for more!

lmsys.org (@lmsysorg) 's Twitter Profile Photo

Exciting news -- we've launched a new X account for our open-source research! At LMSYS, we're dedicated to advancing open-source systems and in-depth technical work besides regular leaderboard updates. Follow @lmsys_oss to stay updated on our roadmap and ongoing developments

Santiago Zanella-Beguelin (@xefffffff) 's Twitter Profile Photo

Had lots of fun with this challenge from lmsys.org and Pliny the Liberator 🐉 (⚠️ strong language). Looking forward to what's next in this new RedTeam Arena platform, especially to the datasets that will be published on a rolling basis.

Had lots of fun with this challenge from <a href="/lmsysorg/">lmsys.org</a>  and <a href="/elder_plinius/">Pliny the Liberator 🐉</a> (⚠️ strong language). 

Looking forward to what's next in this new RedTeam Arena platform, especially to the datasets that will be published on a rolling basis.
Jeff Dean (@🏡) (@jeffdean) 's Twitter Profile Photo

Check out NotebookLM! Create a notebook, upload one or more sources (e.g. PDFs of research papers, your favorite PhD thesis, a newspaper article, etc) then click on 'Generate' to create a podcast of two voices talking about the content you've uploaded. blog.google/technology/ai/…

lmsys.org (@lmsysorg) 's Twitter Profile Photo

Big shoutout to all the jailbreakers who've participated in RedTeam Arena! In just a week since launch, we’ve seen ~8,000 users take on the challenge. Now, the codebase is open-source, with data releases coming soon. Alongside Pliny the Liberator 🐉 and BASI, we're building a

Big shoutout to all the jailbreakers who've participated in RedTeam Arena! In just a week since launch, we’ve seen ~8,000 users take on the challenge.

Now, the codebase is open-source, with data releases coming soon. Alongside <a href="/elder_plinius/">Pliny the Liberator 🐉</a> and BASI, we're building a
lmsys.org (@lmsysorg) 's Twitter Profile Photo

No more waiting. o1's is officially on Chatbot Arena! We tested o1-preview and mini with 6K+ community votes. 🥇o1-preview: #1 across the board, especially in Math, Hard Prompts, and Coding. A huge leap in technical performance! 🥈o1-mini: #1 in technical areas, #2 overall.

No more waiting. o1's is officially on Chatbot Arena!

We tested o1-preview and mini with 6K+ community votes.

🥇o1-preview: #1 across the board, especially in Math, Hard Prompts, and Coding. A huge leap in technical performance!
🥈o1-mini: #1 in technical areas, #2 overall.