Shayne Longpre (@shayneredford) 's Twitter Profile
Shayne Longpre

@shayneredford

Lead the Data Provenance Initiative. PhD @MIT. 🇨🇦
Prev: @Google Brain, Apple, Stanford.
Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact

ID: 3025082120

linkhttp://www.shaynelongpre.com calendar_today18-02-2015 08:27:29

1,1K Tweet

4,4K Followers

1,1K Following

Viraat Aryabumi (@viraataryabumi) 's Twitter Profile Photo

✅ paper picked up by Tanishq Mathew Abraham, Ph.D. and AK ✒️ So, to code or not? The answer, my friend, is a balance to seek, In every phase, code's benefits we speak, Delve into our paper for insights unique! 🖋️ Read at arxiv.org/abs/2408.10914

Digital Economy @JWI_Berlin (@jwi_digi_econ) 's Twitter Profile Photo

📢 Don't miss these two fascinating #PLAMADISO talks TOMORROW (online)! 🚀🚀🚀 🌐1st talk: Shayne Longpre (Massachusetts Institute of Technology (MIT)) at 2.00pm 🌐2nd talk: S. Puntoni (The Wharton School) at 3.30pm Register NOW & spread the word, thx!🙏 👉More information & free registration here: plamadiso.weizenbaum-institut.de/events/

Avijit Ghosh (@evijitghosh) 's Twitter Profile Photo

Truly open-source AI should include not just model weights but also training data, code, and thorough documentation. Open Source Initiative @[email protected] has a new definition of Open-Source AI, and I got to talk to MIT Technology Review about it! technologyreview.com/2024/08/22/109…

Arvind Narayanan (@random_walker) 's Twitter Profile Photo

Happening a week from today! Great to see the level of interest so far. Register for Zoom link. Here's the info if you want to share it on your company Slack or in your networks. Website: sites.google.com/princeton.edu/… Registration: bit.ly/agents-workshop Poster:

Happening a week from today! Great to see the level of interest so far. Register for Zoom link.

Here's the info if you want to share it on your company Slack or in your networks.
Website: sites.google.com/princeton.edu/…
Registration: bit.ly/agents-workshop
Poster:
Sayash Kapoor (@sayashk) 's Twitter Profile Photo

Hundreds of people have signed up for our workshop on Useful and Reliable AI Agents! Learn how to create agents that are accurate, reliable, and cheap from the developers of SWE-Bench, LangChain, DSPy, and more. RSVP: bit.ly/agents-workshop

Hundreds of people have signed up for our workshop on Useful and Reliable AI Agents! Learn how to create agents that are accurate, reliable, and cheap from the developers of SWE-Bench, LangChain, DSPy, and more. RSVP: bit.ly/agents-workshop
Twelve Labs (twelvelabs.io) (@twelve_labs) 's Twitter Profile Photo

In the 57th session of #MultimodalWeekly, we have three exciting presentations - two on video captions and one on training data for foundation models.

In the 57th session of #MultimodalWeekly, we have three exciting presentations - two on video captions and one on training data for foundation models.
Eric Topol (@erictopol) 's Twitter Profile Photo

The problem with #AI training sets uncovered by an audit of >1,800: "a crisis of misattribution" >70% license omission rates >50% license error rates nature.com/articles/s4225… nature.com/articles/s4225… Nature Machine Intelligence Sara Hooker Shayne Longpre Robert Mahari dataprovenance.org

EnricoShippole (@enricoshippole) 's Twitter Profile Photo

Happy to announce that our research paper, A large-scale audit of dataset licensing and attribution in AI, was just published in Nature Machine Intelligence. Data provenance is becoming ever increasingly important as sources shut down open access to information.

Happy to announce that our research paper, A large-scale audit of dataset licensing and attribution in AI, was just published in Nature Machine Intelligence. Data provenance is becoming ever increasingly important as sources shut down open access to information.
IEEE Spectrum (@ieeespectrum) 's Twitter Profile Photo

With more websites using robots.txt to restrict crawler bots, AI companies may soon be hurting for training data. A Q&A with Shayne Longpre of the Data Provenance Initiative. spectrum.ieee.org/web-crawling?s…

Sara Hooker (@sarahookr) 's Twitter Profile Photo

Very proud to share that our work has been accepted to Nature Machine Intelligence 🔥 This work provides a critical audit -- highlights a crisis in the misattribution of datasets driving many recent breakthroughs. This also marked the beginning of the data provenance initiative. ✨

Very proud to share that our work has been accepted to <a href="/NatMachIntell/">Nature Machine Intelligence</a> 🔥

This work provides a critical audit -- highlights a crisis in the misattribution of datasets driving many recent breakthroughs. 

This also marked the beginning of the data provenance initiative. ✨
Digital Economy @JWI_Berlin (@jwi_digi_econ) 's Twitter Profile Photo

📢Are you interested in the consequences of data restrictions on AI development? 🤔 👉Don’t miss the #PLAMADISO talk “Consent in Crisis: The Rapid Decline of the #AI #DataCommons” by Shayne Longpre 📊🤖 🎥 Watch it now! youtube.com/watch?v=F0iGBl… 🙏 Watch, like, & subscribe!

Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜arxiv.org/abs/2409.02060 🧵1/9

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source
- 1B active, 7B total params for 5T tokens
- Best small LLM &amp; matches more costly ones like Gemma, Llama
- Open Model/Data/Code/Logs + lots of analysis &amp; experiments

📜arxiv.org/abs/2409.02060
🧵1/9
Weiyan Shi (@shi_weiyan) 's Twitter Profile Photo

🎉Honored to be named as 35 Innovators Under 35 by MIT MIT Technology Review 🙏 Very grateful to the amazing mentors and friends along the journey!!! 🚀Excited to keep exploring "how to persuade humans for social good and persuade AI for safety"! Onward!! #tr35 technologyreview.com/innovator/weiy…