Berkeley AI Research(@berkeley_ai) 's Twitter Profileg
Berkeley AI Research

@berkeley_ai

We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.

ID:891077171673931776

linkhttp://bair.berkeley.edu/ calendar_today28-07-2017 23:25:27

772 Tweets

152,2K Followers

190 Following

Follow People
Berkeley AI Research(@berkeley_ai) 's Twitter Profile Photo

Registration is now open for an exciting workshop organized by Aditi Krishnapriyan and Jennifer Listgarten at the Simons Institute for the Theory of Computing June 10th-14th in Berkeley, AI≡Science: Strengthening the Bond Between the Sciences and Artificial Intelligence. simons.berkeley.edu/workshops/aisc…

account_circle
Xiuyu Li(@xiuyu_l) 's Twitter Profile Photo

Handling long context in LLMs is expensive, but can we cut the cost by learning them offline for a specific set/genre of documents?

Introducing LLoCO, our new technique that learns documents offline through context compression and in-domain finetuning using LoRA, which archives…

Handling long context in LLMs is expensive, but can we cut the cost by learning them offline for a specific set/genre of documents? Introducing LLoCO, our new technique that learns documents offline through context compression and in-domain finetuning using LoRA, which archives…
account_circle
Jiayi Pan(@pan_jiayipan) 's Twitter Profile Photo

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents!

We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%.

arxiv.org/abs/2404.06474 [🧵]

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents! We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%. arxiv.org/abs/2404.06474 [🧵]
account_circle
Shishir Patil(@shishirpatil_) 's Twitter Profile Photo

📢Excited to release GoEx⚡️a runtime for LLM-generated actions like code, API calls, and more. Featuring 'post-facto validation' for assessing LLM actions after execution 🔍 Key to our approach is 'undo' 🔄 and 'damage confinement' abstractions to manage unintended actions &…

account_circle
Jason Hu(@onjas_buidl) 's Twitter Profile Photo

🚀 Introducing RouterBench, the first comprehensive benchmark for evaluating LLM routers! 🎉
A collaboration between Martian and Prof. Kurt Keutzer at UC Berkeley, we've created the first holistic framework to assess LLM routing systems. 🧵1/8

To read more:…

account_circle
Ken Goldberg(@Ken_Goldberg) 's Twitter Profile Photo

'Why don't we have better robots yet?': just posted on the TED Talks home page under Newest Talks (3rd row from the top) with links to PiE Robotics and Forbes article on art by Ben Wolff Benjamin Wolff @Berkeley_AI UC Berkeley Berkeley Engineering TED.com

'Why don't we have better robots yet?': just posted on the @TEDTalks home page under Newest Talks (3rd row from the top) with links to @PieRobotics and @Forbes article on art by Ben Wolff @creativecellist @Berkeley_AI @UCBerkeley @Cal_Engineer TED.com
account_circle
Karl Pertsch(@KarlPertsch) 's Twitter Profile Photo

Access to *diverse* training data is a major bottleneck in robot learning. We're releasing DROID, a large-scale in-the-wild manipulation dataset. 76k trajectories, 500+ scenes, multi-view stereo, language annotations etc
Check it out & download today!

💻: droid-dataset.github.io

account_circle
Carlo Sferrazza(@carlo_sferrazza) 's Twitter Profile Photo

Humanoids 🤖 will do anything humans can do. But are state-of-the-art algorithms up to the challenge?

Introducing HumanoidBench, the first-of-its-kind simulated humanoid benchmark with 27 distinct whole-body tasks requiring intricate long-horizon planning and coordination.

🧵👇

account_circle
Anastasios Nikolas Angelopoulos(@ml_angelopoulos) 's Twitter Profile Photo

U give me: a bunch of unlabeled data.

I give u: AI-generated labels.

Result: a massive, but biased, val set.

We use PPI to correct the bias, giving unbiased evaluations with better precision 🚀

arxiv.org/abs/2403.07008

Experiments on GPT-4 and ResNets, using lmsys.org :)

U give me: a bunch of unlabeled data. I give u: AI-generated labels. Result: a massive, but biased, val set. We use PPI to correct the bias, giving unbiased evaluations with better precision 🚀 arxiv.org/abs/2403.07008 Experiments on GPT-4 and ResNets, using @lmsysorg :)
account_circle
Berkeley AI Research(@berkeley_ai) 's Twitter Profile Photo

Looking to hire top AI talent?

We've compiled a list of the brilliant Berkeley AI Research Ph.D. Graduates of 2024 who are currently on the academic and industry job markets. (Thanks to our friends Stanford AI Lab for the idea!)

Check it out here:
bair.berkeley.edu/blog/2024/03/1…

account_circle
Katie Kang(@katie_kang_) 's Twitter Profile Photo

We know LLMs hallucinate, but what governs what they dream up? Turns out it’s all about the “unfamiliar” examples they see during finetuning

Our new paper shows that manipulating the supervision on these special examples can steer how LLMs hallucinate

arxiv.org/abs/2403.05612
🧵

We know LLMs hallucinate, but what governs what they dream up? Turns out it’s all about the “unfamiliar” examples they see during finetuning Our new paper shows that manipulating the supervision on these special examples can steer how LLMs hallucinate arxiv.org/abs/2403.05612 🧵
account_circle
Catherine Chen(@cathychen23) 's Twitter Profile Photo

Do brain representations of language depend on whether the inputs are pixels or sounds?

Our Communications Biology paper studies this question from the perspective of language timescales. We find that representations are highly similar between modalities! rdcu.be/dACh5

1/8

Do brain representations of language depend on whether the inputs are pixels or sounds? Our @CommsBio paper studies this question from the perspective of language timescales. We find that representations are highly similar between modalities! rdcu.be/dACh5 1/8
account_circle
Boyi Li(@Boyiliee) 's Twitter Profile Photo

🚀 Thrilled to share our CVPR 2024 paper: Self-correcting LLM-controlled Diffusion Models (SLD)!

SLD can automatically edit any image or fix text-to-image misalignments across any generative model like and - no extra training is needed.

youtube.com/watch?v=PxoOl9…

account_circle
Ritwik Gupta 🇺🇦(@Ritwik_G) 's Twitter Profile Photo

From your cell phone to your TV, images and videos are now captured in 4K resolution or better. Vision methods, however, opt to downsize or crop them, losing information. We introduce xT, our framework to model large images end-to-end on contemporary GPUs! ai-climate.berkeley.edu/xt-website/

account_circle
Toru(@ToruO_O) 's Twitter Profile Photo

Achieving bimanual dexterity with RL + Sim2Real!

toruowo.github.io/bimanual-twist/

TLDR - We train two robot hands to twist bottle lids using deep RL followed by sim-to-real. A single policy trained with simple simulated bottles can generalize to drastically different real-world objects.

account_circle
Nika Haghtalab(@nhaghtal) 's Twitter Profile Photo

I'm honored to be included in this amazing cohort of Schmidt Sciences AI2050 fellows.

Grateful to all the students and collaborators, whose efforts towards a comprehensive foundations of AI and ML that accounts for social and strategic considerations, this award also recognizes.

account_circle
Ilija Radosavovic(@ir413) 's Twitter Profile Photo

we cast real-world humanoid control as next token prediction; our approach enables joint training with youtube videos and walks in sf

account_circle
Lawrence Yunliang Chen(@Lawrence_Y_Chen) 's Twitter Profile Photo

Introducing Mirage: Zero-shot transfer of visuomotor policies to unseen robot embodiments 🤖

With Mirage, you can train a policy on one robot and deploy it on a different one that it has never seen, with no additional data or training! 🧵👇 (1/8)

🌐 robot-mirage.github.io

account_circle