Ofir Press (@ofirpress) 's Twitter Profile
Ofir Press

@ofirpress

I build tough benchmarks for LMs and then I get the LMs to solve them. Postdoc @Princeton. PhD from @nlpnoah @UW. Ex-visiting researcher @MetaAI & @MosaicML.

ID: 746788615951355904

linkhttps://ofir.io/about calendar_today25-06-2016 19:34:15

1,1K Tweet

10,10K Followers

3,3K Following

Ofir Press (@ofirpress) 's Twitter Profile Photo

The progress on SWE-bench is nuts. I think my prediction of 2 systems surpassing 35% pass@1 on the full test set by Aug 1 will come true. When we launched in October, nobody wanted to work on the dataset because it was considered "too hard" or "impossible". Acc was 1.96% then.