Sean Kirmani (@seankirmani) 's Twitter Profile
Sean Kirmani

@seankirmani

Researcher at @GoogleDeepMind. Technology optimist.

ID: 1151711025093206017

linkhttps://kirmani.ai calendar_today18-07-2019 04:31:20

87 Tweet

478 Takipçi

505 Takip Edilen

Sean Kirmani (@seankirmani) 's Twitter Profile Photo

If our annual physicals included a yearly full body MRI scan, we'd catch so many diseases earlier. Today a full body scan is $1350. How could we make that cost lower?

Sean Kirmani (@seankirmani) 's Twitter Profile Photo

One lesson that I've internalized is that scaling across all dimensions is important for robotics. Scaling autonomous data collection, new policy conditioning techniques, and architecture research are all valuable knobs to pull to have robust, scalable robot policies.

Sean Kirmani (@seankirmani) 's Twitter Profile Photo

For robotics and AR applications, there’s a lot of benefits of having spatially 3D grounded VLMs. This recent work led by Boyuan Chen adds 3D reasoning capabilities to VLMs. One cool result is that we are able to answer *quantitative* distance questions as a reward signal.

For robotics and AR applications, there’s a lot of benefits of having spatially 3D grounded VLMs. This recent work led by <a href="/BoyuanChen0/">Boyuan Chen</a> adds 3D reasoning capabilities to VLMs.

One cool result is that we are able to answer *quantitative* distance questions as a reward signal.
Sean Kirmani (@seankirmani) 's Twitter Profile Photo

*Iterative Visual Prompting* is an effective technique to probe actionable information out of VLMs. Most exciting is that we can control a robot entirely through visual prompting! Check out more demos at pivot-prompt.github.io

AK (@_akhaliq) 's Twitter Profile Photo

Google presents PIVOT Iterative Visual Prompting Elicits Actionable Knowledge for VLMs demo: huggingface.co/spaces/pivot-p… project page: pivot-prompt.github.io propose a novel visual prompting approach for VLMs that we call Prompting with Iterative Visual Optimization (PIVOT),

Sean Kirmani (@seankirmani) 's Twitter Profile Photo

Can you improve LLM "teachability" via continual learning? We introduce a technique called Language Model Predictive Control (LMPC) to optimize over "trajectories of conversations" to improve LLM teachability. See more at robot-teaching.github.io

Rafael Rafailov (@rm_rafailov) 's Twitter Profile Photo

We have a new preprint out - your language model is not a reward, it’s a Q function! 1. The likelihood of the preferred answer must go down - it’s a policy divergence 2. MCTS guided decoding on language is equivalent to likelihood search on DPO 3. DPO learns credit assignment

We have a new preprint out - your language model is not a reward, it’s a Q function!
1. The likelihood of the preferred answer must go down - it’s a policy divergence
2. MCTS guided decoding on language is equivalent to likelihood search on DPO
3. DPO learns credit assignment
Sean Kirmani (@seankirmani) 's Twitter Profile Photo

Introducing SIMPLER! Scalable robot policy evaluation is hard. We show that it's possible to correlate simulated evaluation performance with the real-world performance. simpler-env.github.io

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’ve built a range of AI systems that can: 🔵 Turn vision and language into action for robots 🔵 Navigate complex virtual 3D environments 🔵 Solve Olympiad-level math problems And more. #GoogleIO x.com/i/events/17853…

Sean Kirmani (@seankirmani) 's Twitter Profile Photo

A good reminder of Moravec’s Paradox. So many everyday tasks are “easy” for people, but quite hard for robots and AI (picking things up from the floor, going up and down stairs, tying a shoe, etc).

Sean Kirmani (@seankirmani) 's Twitter Profile Photo

Gemini can help robots navigate! By giving robots tours of a new space, you can come up with policies to semantically navigate around buildings. Check out the 🧵 for more details!

Demis Hassabis (@demishassabis) 's Twitter Profile Photo

Very excited about the huge potential of applying foundation models to robotics, & Gemini is perfect for this bc it’s natively multimodal. Some cool recent experiments below. If you're interested to work at the frontier of robotics, the Google DeepMind robotics team is hiring!

Sean Kirmani (@seankirmani) 's Twitter Profile Photo

This is a very nice article by Hans Peter Brondmo about our work at Everyday Robots. My time there was one of the most formative parts of my career. My major takeaway is that robots will be “boring” soon. The recent energy in Silicon Valley makes me optimistic. wired.com/story/inside-g…