Sean Kirmani (@seankirmani) Twitter Tweets • TwiDoom

If our annual physicals included a yearly full body MRI scan, we'd catch so many diseases earlier. Today a full body scan is $1350. How could we make that cost lower?

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

One lesson that I've internalized is that scaling across all dimensions is important for robotics. Scaling autonomous data collection, new policy conditioning techniques, and architecture research are all valuable knobs to pull to have robust, scalable robot policies.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Sean Kirmani

@seankirmani

9 months ago

Fun fact: Isaac Asimov’s Runaround is included in the citations.

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Sean Kirmani

@seankirmani

8 months ago

For robotics and AR applications, there’s a lot of benefits of having spatially 3D grounded VLMs. This recent work led by Boyuan Chen adds 3D reasoning capabilities to VLMs. One cool result is that we are able to answer *quantitative* distance questions as a reward signal.

For robotics and AR applications, there’s a lot of benefits of having spatially 3D grounded VLMs. This recent work led by <a href="/BoyuanChen0/">Boyuan Chen</a> adds 3D reasoning capabilities to VLMs.

One cool result is that we are able to answer *quantitative* distance questions as a reward signal.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Sean Kirmani

@seankirmani

7 months ago

*Iterative Visual Prompting* is an effective technique to probe actionable information out of VLMs. Most exciting is that we can control a robot entirely through visual prompting! Check out more demos at pivot-prompt.github.io

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

AK

@_akhaliq

7 months ago

Google presents PIVOT Iterative Visual Prompting Elicits Actionable Knowledge for VLMs demo: huggingface.co/spaces/pivot-p… project page: pivot-prompt.github.io propose a novel visual prompting approach for VLMs that we call Prompting with Iterative Visual Optimization (PIVOT),

thumb_up_off_alt390

chat_bubble_outline4

repeat80

shareShare

Sean Kirmani

@seankirmani

7 months ago

Can you improve LLM "teachability" via continual learning? We introduce a technique called Language Model Predictive Control (LMPC) to optimize over "trajectories of conversations" to improve LLM teachability. See more at robot-teaching.github.io

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Sean Kirmani

@seankirmani

7 months ago

This is amazing work! I’m really excited about the future of controllable /playable video generation.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Sean Kirmani

@seankirmani

7 months ago

This is 100% correct. If our existence proof for AGI is a human, then it’s not AGI without motor intelligence.

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Sean Kirmani

@seankirmani

6 months ago

Congrats Karol and the whole founding team! Great to see more robotics companies in the world!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Sean Kirmani

@seankirmani

5 months ago

These are some of the most dexterous manipulation polices I've seen in the past year!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Rafael Rafailov

@rm_rafailov

5 months ago

We have a new preprint out - your language model is not a reward, it’s a Q function! 1. The likelihood of the preferred answer must go down - it’s a policy divergence 2. MCTS guided decoding on language is equivalent to likelihood search on DPO 3. DPO learns credit assignment

thumb_up_off_alt958

chat_bubble_outline15

repeat155

shareShare

Sean Kirmani

@seankirmani

4 months ago

Introducing SIMPLER! Scalable robot policy evaluation is hard. We show that it's possible to correlate simulated evaluation performance with the real-world performance. simpler-env.github.io

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Google DeepMind

@googledeepmind

4 months ago

We’ve built a range of AI systems that can: 🔵 Turn vision and language into action for robots 🔵 Navigate complex virtual 3D environments 🔵 Solve Olympiad-level math problems And more. #GoogleIO x.com/i/events/17853…

thumb_up_off_alt163

chat_bubble_outline5

repeat38

shareShare

Sean Kirmani

@seankirmani

4 months ago

A good reminder of Moravec’s Paradox. So many everyday tasks are “easy” for people, but quite hard for robots and AI (picking things up from the floor, going up and down stairs, tying a shoe, etc).

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Sean Kirmani

@seankirmani

2 months ago

Gemini can help robots navigate! By giving robots tours of a new space, you can come up with policies to semantically navigate around buildings. Check out the 🧵 for more details!

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Demis Hassabis

@demishassabis

a month ago

Very excited about the huge potential of applying foundation models to robotics, & Gemini is perfect for this bc it’s natively multimodal. Some cool recent experiments below. If you're interested to work at the frontier of robotics, the Google DeepMind robotics team is hiring!

thumb_up_off_alt633

chat_bubble_outline14

repeat105

shareShare

Sean Kirmani

@seankirmani

12 days ago

This is a very nice article by Hans Peter Brondmo about our work at Everyday Robots. My time there was one of the most formative parts of my career. My major takeaway is that robots will be “boring” soon. The recent energy in Silicon Valley makes me optimistic. wired.com/story/inside-g…

thumb_up_off_alt58

chat_bubble_outline0

repeat10

shareShare