Haotian Liu(@imhaotian) 's Twitter Profileg
Haotian Liu

@imhaotian

building intelligence @xAI, creator of #LLaVA, prev. @MSFTResearch @UWMadison

ID:2267475408

linkhttps://hliu.cc/ calendar_today29-12-2013 14:33:07

142 Tweets

6,3K Followers

407 Following

Chunyuan Li(@ChunyuanLi) 's Twitter Profile Photo

🔍Large multimodal models (LMM) like LLaVA are surprisingly good at zero-shot OCR tasks in the wild, without OCR data in training. 🤔We compare 5 open-sourced LMM on 18 text recognition datasets to uncover the hidden mystery of OCR in LMM.
Paper: arxiv.org/abs/2305.07895

🔍Large multimodal models (LMM) like LLaVA are surprisingly good at zero-shot OCR tasks in the wild, without OCR data in training. 🤔We compare 5 open-sourced LMM on 18 text recognition datasets to uncover the hidden mystery of OCR in LMM. #OCR #LMM Paper: arxiv.org/abs/2305.07895
account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

A big problem in multimodal chatbot is the lack of naturally occurring data. It isn't common for people to interleave images & texts in a long chat.

Clever trick:
1. A paired-modality dataset (e.g. image caption)
2. Use GPT-4 as a smart data augmenter to *simulate* dialogue

🧵

A big problem in multimodal chatbot is the lack of naturally occurring data. It isn't common for people to interleave images & texts in a long chat. Clever trick: 1. A paired-modality dataset (e.g. image caption) 2. Use GPT-4 as a smart data augmenter to *simulate* dialogue 🧵
account_circle
lmsys.org(@lmsysorg) 's Twitter Profile Photo

The open-source clone of GPT-4 is almost here!

LLaVA combines a vision encoder and Vicuna, enabling it to see, recognize, talk about, and reason about images in a different way.

account_circle