rishi (@RishiBommasani) Twitter Tweets • TwiCopy

rishi

@RishiBommasani

+ Follow

Stanford CS PhD @StanfordCRFM
@StanfordNLP @StanfordAILab @StanfordHAI

Advisers: @percyliang @jurafsky
Previous: @CornellCIS @clairecardie
#FoundationModels

ID:895659037198393344

linkhttps://rishibommasani.github.io calendar_today10-08-2017 14:52:09

6,6K Tweets

4,2K Followers

1,5K Following

Percy Liang

10 hours ago

MMLU is the standard LM evaluation but model developers (i) use different prompting strategies and (ii) often do not release prompts. 3rd-party researchers often obtain lower scores 🤯

📢 HELM MMLU uses simple, standardized prompts, resulting in fair, reproducible comparisons of…

MMLU is the standard LM evaluation but model developers (i) use different prompting strategies and (ii) often do not release prompts. 3rd-party researchers often obtain lower scores 🤯 📢 HELM MMLU uses simple, standardized prompts, resulting in fair, reproducible comparisons of…

thumb_up_off_alt30

chat_bubble_outline0

account_circle

Sayash Kapoor

19 hours ago

Excited to share that our paper introducing the REFORMS checklist is now out Science Advances!

In it, we:
- review common errors in ML for science
- create a checklist of 32 items applicable across disciplines
- provide in-depth guidelines for each item

science.org/doi/10.1126/sc…

thumb_up_off_alt70

chat_bubble_outline0

account_circle

Russ Poldrack

20 hours ago

I am really excited to be part of this project led by Sayash Kapoor and Arvind Narayanan to help improve practices in machine-learning based science. science.org/doi/10.1126/sc…

thumb_up_off_alt36

chat_bubble_outline0

account_circle

Christopher Manning

21 hours ago

🇦🇹 I’m going to #iclr2024 in Vienna next week. Who all do I know that’ll be there?

Students’ papers there:
Katherine Tian: arxiv.org/abs/2311.08401
Charlotte Nicks: openreview.net/forum?id=4eJDM…
Eric: arxiv.org/abs/2310.12962
Parth Sarthi: arxiv.org/abs/2401.18059

thumb_up_off_alt104

chat_bubble_outline0

account_circle

Andrew Strait

2 days ago

A few questions.

1. I wonder if these licenses are exclusive? Can FT also license its data to other developers, or no?

2. How did they determine a value for their data? If terms are not disclosed for these deals, how can other news sites bargain/set an appropriate rate?

thumb_up_off_alt14

chat_bubble_outline0

account_circle

Arvind Narayanan

1 day ago

On tasks like coding we can keep increasing accuracy by indefinitely increasing inference compute, so leaderboards are meaningless. The HumanEval accuracy-cost Pareto curve is entirely zero-shot models + our dead simple baseline agents.
New research w Sayash Kapoor Benedikt Ströbl 🧵

On tasks like coding we can keep increasing accuracy by indefinitely increasing inference compute, so leaderboards are meaningless. The HumanEval accuracy-cost Pareto curve is entirely zero-shot models + our dead simple baseline agents. New research w @sayashk @benediktstroebl 🧵

thumb_up_off_alt193

chat_bubble_outline0

account_circle

Clément Canonne

2 days ago

Sara Hooker Research is social. It's important to talk to people, interact with other students, ask things, email researchers, go to conferences, talk about your work, and ask to talk about your work. It's intimidating, might feel awkward, but crucial.

thumb_up_off_alt90

chat_bubble_outline0

account_circle

Vaishali

2 days ago

Yes!!!😊

thumb_up_off_alt11,0K

chat_bubble_outline0

account_circle

rishi

@RishiBommasani

2 days ago

Jelani makes an excellent point

I would go further and extend this to the graduate level with fellowships from both government (e.g. NSF, NDSEG) and industry (e.g. Google, Meta).
Marginal benefit for students/faculty is often greater elsewhere.

- signed by Cornell/Stanford alum

thumb_up_off_alt12

chat_bubble_outline0

account_circle

Aryaman Arora

2 days ago

in the coming weeks me and Zhengxuan Wu are giving in-person talks on ReFT at
- Demandbase (SF, 5/1)
- Stanford NLP Group Lunch (Stanford, 5/2)
- Amazon Web Services Generative AI (Santa Clara, 5/10)

thumb_up_off_alt43

chat_bubble_outline0

account_circle

Stephan Xie

3 days ago

belated but I'm excited to start a PhD at Machine Learning Dept. at Carnegie Mellon this fall as a NSF fellow!! I'm incredibly grateful to my mentors and advisors Aaron Roth, Kevin He, and Yi Xing for all their guidance and support along the way 😀

thumb_up_off_alt64

chat_bubble_outline0

account_circle

Niladri Chatterji

3 days ago

Has been super fun working on Llama 3! Really excited about the models yet to come!

ai.meta.com/blog/meta-llam…

thumb_up_off_alt32

chat_bubble_outline0

account_circle

Deb Raji

5 days ago

Sometimes I randomly think about how so incredibly *lucky* we are that Alondra was appointed OSTP Director when she was - the AI Bill of Rights came at such the right time & continues to inform everything from the EO & OMB guidelines to state and municipal legislation. Like, wow.

thumb_up_off_alt104

chat_bubble_outline0

account_circle

National Academy of Sciences

5 days ago

Congratulations Dan Boneh of Stanford University, newly inducted #NASmember ! #NAS161 #computerscience

Congratulations Dan Boneh of @Stanford, newly inducted #NASmember! #NAS161 #computerscience

thumb_up_off_alt17

chat_bubble_outline0

account_circle

Stanford HAI

5 days ago

Earlier this month, we convened researchers and leaders in the EU, BRICS, Africa, global AI Safety Institutes & international organizations to navigate the complex policy ecosystem directing society's adoption of AI systems. Missed this event? Watch here: stanford.io/4bdPeOM

Earlier this month, we convened researchers and leaders in the EU, BRICS, Africa, global AI Safety Institutes & international organizations to navigate the complex policy ecosystem directing society's adoption of AI systems. Missed this event? Watch here: stanford.io/4bdPeOM

thumb_up_off_alt26

chat_bubble_outline0

account_circle

Percy Liang

6 days ago

HELM Lite v1.2.0 is out!
Datasets: NarrativeQA, NaturalQA, OpenbookQA, MMLU, MATH, GSM8K, LegalBench, MedQA, WMT14
Results (we still need to add Claude 3, which requires more prompt finagling):
crfm.stanford.edu/helm/lite/v1.2…

HELM Lite v1.2.0 is out! Datasets: NarrativeQA, NaturalQA, OpenbookQA, MMLU, MATH, GSM8K, LegalBench, MedQA, WMT14 Results (we still need to add Claude 3, which requires more prompt finagling): crfm.stanford.edu/helm/lite/v1.2…

thumb_up_off_alt209

chat_bubble_outline0

account_circle

Sasha Rush

1 week ago

There is a really nice community of researchers developing transformer alternatives. Want to highlight these impressive folks.
Simran Arora (Simran Arora), Chunting Zhou (@violet_zct), Dan Fu (@realDanFu), and Songlin Yang (@SonglinYang4)

There is a really nice community of researchers developing transformer alternatives. Want to highlight these impressive folks. Simran Arora (@simran_s_arora), Chunting Zhou (@violet_zct), Dan Fu (@realDanFu), and Songlin Yang (@SonglinYang4)

thumb_up_off_alt447

chat_bubble_outline0

account_circle