rishi(@RishiBommasani) 's Twitter Profileg
rishi

@RishiBommasani

Stanford CS PhD @StanfordCRFM
@StanfordNLP @StanfordAILab @StanfordHAI

Advisers: @percyliang @jurafsky
Previous: @CornellCIS @clairecardie
#FoundationModels

ID:895659037198393344

linkhttps://rishibommasani.github.io calendar_today10-08-2017 14:52:09

6,6K Tweets

4,2K Followers

1,5K Following

Percy Liang(@percyliang) 's Twitter Profile Photo

MMLU is the standard LM evaluation but model developers (i) use different prompting strategies and (ii) often do not release prompts. 3rd-party researchers often obtain lower scores 🤯

📢 HELM MMLU uses simple, standardized prompts, resulting in fair, reproducible comparisons of…

MMLU is the standard LM evaluation but model developers (i) use different prompting strategies and (ii) often do not release prompts. 3rd-party researchers often obtain lower scores 🤯 📢 HELM MMLU uses simple, standardized prompts, resulting in fair, reproducible comparisons of…
account_circle
Sayash Kapoor(@sayashk) 's Twitter Profile Photo

Excited to share that our paper introducing the REFORMS checklist is now out Science Advances!

In it, we:
- review common errors in ML for science
- create a checklist of 32 items applicable across disciplines
- provide in-depth guidelines for each item

science.org/doi/10.1126/sc…

account_circle
Russ Poldrack(@russpoldrack) 's Twitter Profile Photo

I am really excited to be part of this project led by Sayash Kapoor and Arvind Narayanan to help improve practices in machine-learning based science. science.org/doi/10.1126/sc…

account_circle
Christopher Manning(@chrmanning) 's Twitter Profile Photo

🇦🇹 I’m going to in Vienna next week. Who all do I know that’ll be there?

Students’ papers there:
Katherine Tian: arxiv.org/abs/2311.08401
Charlotte Nicks: openreview.net/forum?id=4eJDM…
Eric: arxiv.org/abs/2310.12962
Parth Sarthi: arxiv.org/abs/2401.18059

account_circle
Andrew Strait(@agstrait) 's Twitter Profile Photo

A few questions.

1. I wonder if these licenses are exclusive? Can FT also license its data to other developers, or no?

2. How did they determine a value for their data? If terms are not disclosed for these deals, how can other news sites bargain/set an appropriate rate?

account_circle
Arvind Narayanan(@random_walker) 's Twitter Profile Photo

On tasks like coding we can keep increasing accuracy by indefinitely increasing inference compute, so leaderboards are meaningless. The HumanEval accuracy-cost Pareto curve is entirely zero-shot models + our dead simple baseline agents.
New research w Sayash Kapoor Benedikt Ströbl 🧵

On tasks like coding we can keep increasing accuracy by indefinitely increasing inference compute, so leaderboards are meaningless. The HumanEval accuracy-cost Pareto curve is entirely zero-shot models + our dead simple baseline agents. New research w @sayashk @benediktstroebl 🧵
account_circle
Clément Canonne(@ccanonne_) 's Twitter Profile Photo

Sara Hooker Research is social. It's important to talk to people, interact with other students, ask things, email researchers, go to conferences, talk about your work, and ask to talk about your work. It's intimidating, might feel awkward, but crucial.

account_circle
rishi(@RishiBommasani) 's Twitter Profile Photo

Jelani makes an excellent point

I would go further and extend this to the graduate level with fellowships from both government (e.g. NSF, NDSEG) and industry (e.g. Google, Meta).
Marginal benefit for students/faculty is often greater elsewhere.

- signed by Cornell/Stanford alum

account_circle
Stephan Xie(@stephofx) 's Twitter Profile Photo

belated but I'm excited to start a PhD at Machine Learning Dept. at Carnegie Mellon this fall as a NSF fellow!! I'm incredibly grateful to my mentors and advisors Aaron Roth, Kevin He, and Yi Xing for all their guidance and support along the way 😀

account_circle
Niladri Chatterji(@niladrichat) 's Twitter Profile Photo

Has been super fun working on Llama 3! Really excited about the models yet to come!

ai.meta.com/blog/meta-llam…

account_circle
Deb Raji(@rajiinio) 's Twitter Profile Photo

Sometimes I randomly think about how so incredibly *lucky* we are that Alondra was appointed OSTP Director when she was - the AI Bill of Rights came at such the right time & continues to inform everything from the EO & OMB guidelines to state and municipal legislation. Like, wow.

account_circle
Stanford HAI(@StanfordHAI) 's Twitter Profile Photo

Earlier this month, we convened researchers and leaders in the EU, BRICS, Africa, global AI Safety Institutes & international organizations to navigate the complex policy ecosystem directing society's adoption of AI systems. Missed this event? Watch here: stanford.io/4bdPeOM

Earlier this month, we convened researchers and leaders in the EU, BRICS, Africa, global AI Safety Institutes & international organizations to navigate the complex policy ecosystem directing society's adoption of AI systems. Missed this event? Watch here: stanford.io/4bdPeOM
account_circle
Percy Liang(@percyliang) 's Twitter Profile Photo

HELM Lite v1.2.0 is out!
Datasets: NarrativeQA, NaturalQA, OpenbookQA, MMLU, MATH, GSM8K, LegalBench, MedQA, WMT14
Results (we still need to add Claude 3, which requires more prompt finagling):
crfm.stanford.edu/helm/lite/v1.2…

HELM Lite v1.2.0 is out! Datasets: NarrativeQA, NaturalQA, OpenbookQA, MMLU, MATH, GSM8K, LegalBench, MedQA, WMT14 Results (we still need to add Claude 3, which requires more prompt finagling): crfm.stanford.edu/helm/lite/v1.2…
account_circle
Sasha Rush(@srush_nlp) 's Twitter Profile Photo

There is a really nice community of researchers developing transformer alternatives. Want to highlight these impressive folks.
Simran Arora (Simran Arora), Chunting Zhou (@violet_zct), Dan Fu (@realDanFu), and Songlin Yang (@SonglinYang4)

There is a really nice community of researchers developing transformer alternatives. Want to highlight these impressive folks. Simran Arora (@simran_s_arora), Chunting Zhou (@violet_zct), Dan Fu (@realDanFu), and Songlin Yang (@SonglinYang4)
account_circle