@nasimborazjani : 🚨 OpenAI's new o1 model scores only 38.2% in correctness on our new benchmark of combinatorial problems, SearchBench (arxiv.org/abs/24506.12172), while 57.1% is possible with GPT-4 and A* MSMT prompting! 🚨 • TwiDoom

Nasim Borazjanizadeh

@nasimborazjani

+ Follow

ID: 1796236801826549760

calendar_today30-05-2024 17:46:53

7 Tweet

36 Followers

12 Following

Nasim Borazjanizadeh

@nasimborazjani

6 days ago

🚨 OpenAI's new o1 model scores only 38.2% in correctness on our new benchmark of combinatorial problems, SearchBench (arxiv.org/abs/2406.12172), while 57.1% is possible with GPT-4 and A* MSMT prompting! 🚨

thumb_up_off_alt12

chat_bubble_outline1

repeat2

shareShare