Itay Itzhak (@itay_itzhak_) Twitter Tweets • TwiDoom

Moran Mizrahi

8 months ago

🚀 Excited to share our latest paper about the sensitivity of LLMs to prompts! arxiv.org/abs/2401.00595 Our work may partly explain why some models seem less accurate than their formal evaluation may suggest. 🧐 Guy Kaplan, Dan H.M 🎗, Rotem Dror, Hyadata Lab (Dafna Shahaf), Gabriel Stanovsky

thumb_up_off_alt82

chat_bubble_outline3

repeat26

shareShare

Itay Itzhak

@itay_itzhak_

5 months ago

GPT-4 passed the official exams used to license medical specialists

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Adi Simhi

@adisimhi

5 months ago

Excited to share our new paper- Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs Check out our work here: arxiv.org/abs/2404.09971 This work was done with Jonathan Herzig Idan Szpektor Yonatan Belinkov

thumb_up_off_alt18

chat_bubble_outline2

repeat7

shareShare

Asaf Yehudai

@asafyehudai

5 months ago

🎉Excited to share our paper When LLMs are Unfit, Use FastFit: Fast and Effective Text Classification with Many Classes was accepted to NAACL HLT 2024! 🚀SOTA results ⚡Fast training & inference 🎯High accuracy 📄Paper:arxiv.org/abs/2404.12365 💻Package:github.com/IBM/fastfit

🎉Excited to share our paper

When LLMs are Unfit, Use FastFit: Fast and Effective Text Classification with Many Classes

was accepted to <a href="/naaclmeeting/">NAACL HLT 2024</a>!

🚀SOTA results
⚡Fast training & inference
🎯High accuracy
📄Paper:arxiv.org/abs/2404.12365
💻Package:github.com/IBM/fastfit

thumb_up_off_alt78

chat_bubble_outline6

repeat21

shareShare

Adi Simhi

@adisimhi

5 months ago

LLMs are often said to "hallucinate", "confabulate", or produce untruthful responses, which led to much work trying to mitigate such behavior. But what does it mean for an LM to hallucinate? And how can we effectively intervene in model internals to combat hallucinations?

thumb_up_off_alt114

chat_bubble_outline8

repeat16

shareShare

Michael Toker

@michael_toker

5 months ago

What if we could visualize language models’ computation process - with images? Introducing Diffusion Lens: a method for peeking into the internals of the text encoders of text-to-image pipelines. arxiv.org/abs/2403.05846 Demo: huggingface.co/spaces/tokeron… [1/9]

thumb_up_off_alt39

chat_bubble_outline1

repeat11

shareShare

Zachary Bamberger

@zacharybamberg1

4 months ago

Paper release 🧵: We (Yonatan Belinkov , Chaim Baskin, Ofek Glick and I) are proud to introduce DEPTH: Discourse Education through Pre-Training Hierarchically. Code: github.com/zbambergerNLP/… Paper: arxiv.org/abs/2405.07788

thumb_up_off_alt24

chat_bubble_outline2

repeat9

shareShare

Hadas Orgad

@orgadhadas

4 months ago

Our paper Diffusion Lens got accepted to #ACL2024 main conference! 🌴⭐️ Visualize LLMs computation process with our live demo >> huggingface.co/spaces/tokeron… For a quick TL;DR checkout Michael Toker's thread or project website - tokeron.github.io/DiffusionLensW…

thumb_up_off_alt31

chat_bubble_outline2

repeat6

shareShare

Dana Arad 🎗️

@dana_arad4

4 months ago

Excited to share Diffusion Lens got accepted to #ACL2024 main conference! 🎉 Check out our demo: huggingface.co/spaces/tokeron… and paper: tokeron.github.io/DiffusionLensW…

thumb_up_off_alt24

chat_bubble_outline0

repeat6

shareShare

Gabriel Stanovsky

@gabistanovsky

2 months ago

Check out SEAM🤐 : a challenging LLM benchmark for multi-doc tasks and a stochastic approach to evaluation which addresses the brittleness of few-shot evaluation. Evaluate your model at seam-benchmark.github.io

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

Shachar Don-Yehiya

@shachar_don

2 months ago

Human feedback is critical for language models development 💬, but collecting it is costly 🤑 We find that users naturally include feedback when interacting with chat models, and we can automatically extract it! arxiv.org/abs/2407.10944 W Leshem Choshen 🤖🤗 Omri Abend 🧵👇

thumb_up_off_alt75

chat_bubble_outline5

repeat22

shareShare

Ori Yoran

@oriyoran

2 months ago

Can AI agents solve realistic, time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?" We introduce AssistantBench, a benchmark with 214 such tasks. Our new GPT-4 based agent gets just 25% accuracy! assistantbench.github.io

thumb_up_off_alt160

chat_bubble_outline6

repeat47

shareShare

David Bau

@davidbau

2 months ago

Time to study #llama3 405b, but gosh it's big! Please retweet: if you have a great experiment but not enough GPU, here is an opportunity to apply for shared #NDIF research resources. Deadline July 30: ndif.us/405b.html You'll help NDIF test, we'll help you run 405b

thumb_up_off_alt124

chat_bubble_outline2

repeat38

shareShare

HUJI NLP

@nlphuji

a month ago

🚀 Lots of exciting work from the HUJI NLP group and our collaborators at #ACL2024! Come talk to us about it! Moran Mizrahi Gili Lior @ ACL 2024 Itay Itzhak Gabriel Stanovsky

🚀 Lots of exciting work from the HUJI NLP group and our collaborators at #ACL2024!
Come talk to us about it!
<a href="/moranmiz/">Moran Mizrahi</a> <a href="/GiliLior/">Gili Lior @ ACL 2024</a> <a href="/Itay_itzhak_/">Itay Itzhak</a> <a href="/GabiStanovsky/">Gabriel Stanovsky</a>

thumb_up_off_alt20

chat_bubble_outline0

repeat3

shareShare

Itay Itzhak

@itay_itzhak_

a month ago

Are you in #Bangkok? Come to hear about our #TACL paper on emergent cognitive biases in LLMs! Let's chat at: - Convention Center A1 (Poster) – Today, 16:00 - Lotus Suite 5-7 (Talk) – Tuesday, 11:15 Looking forward to geeking out on biases, LLMs, and more! 🚀 #ACL2024 #NLProc

thumb_up_off_alt30

chat_bubble_outline0

repeat3

shareShare

Itay Itzhak

@itay_itzhak_

a month ago

Had a blast presenting at #ACL2024! Thanks to everyone who joined the discussions on LMs biases. Stay tuned for more on the origins of biases - we've got exciting work coming! 🔍 #NLProc

thumb_up_off_alt24

chat_bubble_outline0

repeat0

shareShare

Dana Arad 🎗️

@dana_arad4

25 days ago

Excited about pursuing a graduate degree in AI or Machine Learning? In just two weeks, you have a chance to hear about the research happening in our lab and chat with students and faculty 👩🏻‍🎓

thumb_up_off_alt8

chat_bubble_outline1

repeat3

shareShare