Haotian Liu (@imhaotian) Twitter Tweets • TwiCopy

Haotian Liu

@imhaotian

+ Follow

building intelligence @xAI, creator of #LLaVA, prev. @MSFTResearch @UWMadison

ID:2267475408

linkhttps://hliu.cc/ calendar_today29-12-2013 14:33:07

142 Tweets

6,3K Followers

407 Following

Chunyuan Li

@ChunyuanLi

vor 1 jahr

🔍Large multimodal models (LMM) like LLaVA are surprisingly good at zero-shot OCR tasks in the wild, without OCR data in training. 🤔We compare 5 open-sourced LMM on 18 text recognition datasets to uncover the hidden mystery of OCR in LMM. #OCR #LMM
Paper: arxiv.org/abs/2305.07895

account_circle

Jim Fan

@DrJimFan

vor 1 jahr

A big problem in multimodal chatbot is the lack of naturally occurring data. It isn't common for people to interleave images & texts in a long chat.

Clever trick:
1. A paired-modality dataset (e.g. image caption)
2. Use GPT-4 as a smart data augmenter to *simulate* dialogue

🧵

account_circle

lmsys.org

@lmsysorg

vor 1 jahr

The open-source clone of GPT-4 is almost here!

LLaVA combines a vision encoder and Vicuna, enabling it to see, recognize, talk about, and reason about images in a different way.

account_circle