Zirui "Colin" Wang (@zwcolin) Twitter Tweets • TwiDoom

Zirui "Colin" Wang

@zwcolin

+ Follow

cs @princeton_nlp @princetonPLI | prev @HDSIUCSD @CogSciUCSD, @CarnegieMellon. synergize model understanding & generation; multimodality; He/Him.

ID: 2986434572

linkhttp://ziruiw.net calendar_today17-01-2015 04:18:40

76 Tweet

446 Followers

375 Following

Ethan Mollick

@emollick

3 months ago

We are seeing the first practical benchmarks for AI vision: 1) A challenging real-life chart benchmark on chart reading Charxiv, shows humans get 80% right. Claude 3.5, the best LLM, gets 60% 2) Chatbot Arena compares which AI vision answers humans prefer. GPT-4o wins this one.

thumb_up_off_alt238

chat_bubble_outline2

repeat37

shareShare

Sadhika Malladi

@sadhikamalladi

2 months ago

My new blog post argues from first principles how length normalization in preference learning objectives (e.g., SimPO) can facilitate learning from model-annotated preference data. Check it out! cs.princeton.edu/~smalladi/blog…

thumb_up_off_alt81

chat_bubble_outline1

repeat21

shareShare

Zirui "Colin" Wang

@zwcolin

2 months ago

InternVL2 offers the strongest open-weight multimodal llms but I'm surprised that it's under the radar. Regardless of the model size, its strongest open-weight model offers better performance than GPT4V Turbo, GPT4o-mini and Gemini 1.5 Flash. Congrats to the team!

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Zirui "Colin" Wang

@zwcolin

a month ago

Just finished response to authors' rebuttal for all papers that had a rebuttal in my batch. I hope these in-time responses give people more time/rounds for healthy and meaningful discussions on their papers! 👀 #NeurIPS

thumb_up_off_alt26

chat_bubble_outline0

repeat0

shareShare

Zirui "Colin" Wang

@zwcolin

a month ago

Great paper and lots of insights are shared with what we derived in CharXiv: 1. Small-scale models (e.g., Phi-3) beating larger ones in chart reasoning (e.g., LLaVA 34B). 2. Open-weight models struggle with descriptive Qs. 3. Models cannot even correctly count number of ticks.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Zirui "Colin" Wang

@zwcolin

22 days ago

It's my first time submitting a paper to the neurips DB track, but we didn't receive any feedback from any reviewer two weeks after the rebuttal period, and tomorrow is the last day for the discussion period. Is it common? Did you get your feedback for your D&B track rebuttal?

thumb_up_off_alt1

chat_bubble_outline2

repeat0

shareShare