Percy Liang (@percyliang) Twitter Tweets • TwiCopy

vor 3 tage

HELM MMLU v1.3.0 is out. We've added GPT-4o, Gemini 1.5 Flash, and Palmyra-X v3 - all 3 made it into the top 10.
crfm.stanford.edu/helm/mmlu/v1.3…
Click on the numbers to drill down into the predictions.

account_circle

Percy Liang

vor 4 tage

GPT-4o tops the VHELM leaderboard.

thumb_up_off_alt28

repeat6

account_circle

Yann Dubois

@yanndubs

vor 6 tage

GPT4-o from OpenAI tops AlpacaEval

Actually, the top 3 models are preferred by GPT-4 Preview than itself. By now I've seen many times models that prefer better models than themselves, and that suggests to me that some form of self-improvement (in the narrow sense) is possible!

thumb_up_off_alt64

repeat7

account_circle

Sang Michael Xie

@sangmichaelxie

vor 1 woche

The ME-FoMo #ICLR2024 workshop is tomorrow, Sat May 11 starting at 8:50AM in Vienna!

Room: Strauss 2
Schedule: sites.google.com/view/me-fomo20…

Excited for our amazing speakers: Sasha Rush (ICLR) Hanna Hajishirzi Jacob Steinhardt Amir Globerson Yuandong Tian @ Paris !!

account_circle

Imbue

@imbue_ai

vor 1 woche

🎙️ Generally Intelligent Episode 35: Percy Liang

We sat down with Percy Liang, associate professor of computer science and statistics at Stanford University, to discuss:
- how to evaluate language models robustly
- balancing plurality and consensus with AI
- the role of academia vs.

account_circle

Weights & Biases

@weights_biases

vor 1 woche

🎙️Join Percy Liang, Together AI co-founder and Stanford University Professor, as he discusses the challenges of evaluating LLMs.
𝐋𝐢𝐬𝐭𝐞𝐧 🎧 𝐨𝐫 𝐰𝐚𝐭𝐜𝐡 🎥 𝐧𝐨𝐰: lnk.to/GDkZS7O1

#GradientDissent

thumb_up_off_alt12

repeat3

account_circle

Percy Liang

vor 1 woche

HELM is now fully multimodal! In addition to language models, text-to-image models (HEIM), we now evaluate vision-language models (made possible by MMMU, VQAv2, VizWiz - thanks to the authors!). As usual, the full predictions and prompts are available on the HELM website:

thumb_up_off_alt21

repeat4

account_circle

rishi

@RishiBommasani

vor 1 woche

Transparency for foundation models is an outstanding challenge.

To make progress the White House and G7 have recommended that foundation model developers prepare *transparency reports*.

We recently put out a paper that articulates what this should mean and its policy impact🧵

account_circle

Percy Liang