Jim Fan(@DrJimFan) 's Twitter Profileg
Jim Fan

@DrJimFan

@NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.

ID:1007413134

linkhttps://jimfan.me calendar_today12-12-2012 22:11:27

3,5K Tweets

229,2K Followers

2,9K Following

Follow People
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Llama-3 is closing the gap with GPT-4, but multimodal models gotta catch up. Vision capabilities of open models like LlaVA are far, far behind GPT-4V. Video models are even worse. They hallucinate all the time and fail to give detailed descriptions of complex scenes and actions.

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

AI winter? No. Even if GPT-5 plateaus. Robotics hasn’t even started to scale yet.

Embodied intelligence in the physical world will be a powerhouse for economic value. Friendly reminder to everyone that LLM is not all of AI. It is just one piece of a bigger puzzle.

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Prediction: GPT-5 will be announced before Llama-3-400B releases. External movement defines OpenAI’s PR schedule 🤣

account_circle
Ajay Mandlekar(@AjayMandlekar) 's Twitter Profile Photo

Data is the key driving force behind success in robot learning. Our upcoming RSS 2024 workshop 'Data Generation for Robotics” will feature exciting speakers, timely debates, and more! Submit by May 20th.

sites.google.com/view/data-gene…

Data is the key driving force behind success in robot learning. Our upcoming RSS 2024 workshop 'Data Generation for Robotics” will feature exciting speakers, timely debates, and more! Submit by May 20th. sites.google.com/view/data-gene…
account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

It took my brain a while to parse what's going on in this video. We are so obsessed with 'human-level' robotics that we forget it is just an artificial ceiling. Why don't we make a new species superhuman from day one? Boston Dynamics has once again reinvented itself. Gradually,

account_circle
Elon Musk(@elonmusk) 's Twitter Profile Photo

Jim Fan Two sources of data scale infinitely: synthetic data, which has an “is it true?” problem and real-world video, which does not.

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Tesla FSD v13 will likely be grokking language tokens. What excites me the most about Grok-1.5V is the potential to solve edge cases in self-driving. Using language for 'chain of thought' will help the car break down a complex scenario, reason with rules and counterfactuals, and

Tesla FSD v13 will likely be grokking language tokens. What excites me the most about Grok-1.5V is the potential to solve edge cases in self-driving. Using language for 'chain of thought' will help the car break down a complex scenario, reason with rules and counterfactuals, and
account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

The moat of software AI agents is not the thin wrapper layer (Devin, SWE-Agent), but the underlying LLM. Instead of benchmarking the wrapper, I think SWE-Bench is excellent for evaluating coding LLMs instead:

Hold the agent layer fixed and vary only the LLM backend. Provide all

The moat of software AI agents is not the thin wrapper layer (Devin, SWE-Agent), but the underlying LLM. Instead of benchmarking the wrapper, I think SWE-Bench is excellent for evaluating coding LLMs instead: Hold the agent layer fixed and vary only the LLM backend. Provide all
account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Math talking to bare metal in the purest way. Andrej Karpathy makes AI education not only accessible, but also elegant. I'm reading through the code like a work of art.

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

The legendary class created by Fei-Fei Li & Andrej Karpathy that introduced deep learning to a generation of students. Proud to be a TA alumnus for CS231n! I used to write the Google Cloud tutorial on how to set up GPU instances and run experiments ;)

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Better manual design of the command line tools for GPT-4 is all you need to get 12.3% on SWEBench. There is no magic, no model breakthrough, no justification for the extreme hype.

When GPT-5 comes, instruction following, tool use, and long context will surely be far better. None

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

This sakura video has no more complexity than 262 characters, implemented as shader code that *generates* pixels. A text2video model that achieves maximal possible compression will be able to recover this program approximately in its weights, synthesized through denoising and

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Novelty is so overrated. It's an example of misaligned objective: if reviewers look for novelty, you shape your research and efforts towards that, while devaluing things that actually matter.

I used to review CVPR papers, but stopped wasting time on so many mind-numbing papers

account_circle