Zirui "Colin" Wang (@zwcolin) 's Twitter Profile
Zirui "Colin" Wang

@zwcolin

cs @princeton_nlp @princetonPLI | prev @HDSIUCSD @CogSciUCSD, @CarnegieMellon. synergize model understanding & generation; multimodality; He/Him.

ID: 2986434572

linkhttp://ziruiw.net calendar_today17-01-2015 04:18:40

76 Tweet

446 Followers

375 Following

Ethan Mollick (@emollick) 's Twitter Profile Photo

We are seeing the first practical benchmarks for AI vision: 1) A challenging real-life chart benchmark on chart reading Charxiv, shows humans get 80% right. Claude 3.5, the best LLM, gets 60% 2) Chatbot Arena compares which AI vision answers humans prefer. GPT-4o wins this one.

We are seeing the first practical benchmarks for AI vision:
1) A challenging real-life chart benchmark on chart reading Charxiv, shows humans get 80% right. Claude 3.5, the best LLM, gets 60%
2) Chatbot Arena compares which AI vision answers humans prefer. GPT-4o wins this one.
Sadhika Malladi (@sadhikamalladi) 's Twitter Profile Photo

My new blog post argues from first principles how length normalization in preference learning objectives (e.g., SimPO) can facilitate learning from model-annotated preference data. Check it out! cs.princeton.edu/~smalladi/blog…

Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

InternVL2 offers the strongest open-weight multimodal llms but I'm surprised that it's under the radar. Regardless of the model size, its strongest open-weight model offers better performance than GPT4V Turbo, GPT4o-mini and Gemini 1.5 Flash. Congrats to the team!

Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

Just finished response to authors' rebuttal for all papers that had a rebuttal in my batch. I hope these in-time responses give people more time/rounds for healthy and meaningful discussions on their papers! πŸ‘€ #NeurIPS

Just finished response to authors' rebuttal for all papers that had a rebuttal in my batch. I hope these in-time responses give people more time/rounds for healthy and meaningful discussions on their papers! πŸ‘€ #NeurIPS
Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

Great paper and lots of insights are shared with what we derived in CharXiv: 1. Small-scale models (e.g., Phi-3) beating larger ones in chart reasoning (e.g., LLaVA 34B). 2. Open-weight models struggle with descriptive Qs. 3. Models cannot even correctly count number of ticks.

Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

It's my first time submitting a paper to the neurips DB track, but we didn't receive any feedback from any reviewer two weeks after the rebuttal period, and tomorrow is the last day for the discussion period. Is it common? Did you get your feedback for your D&B track rebuttal?