Jacob Steinhardt (@jacobsteinhardt) 's Twitter Profile
Jacob Steinhardt

@jacobsteinhardt

Assistant Professor of Statistics, UC Berkeley

ID: 438570403

calendar_today16-12-2011 19:04:34

329 Tweet

7,7K Takipçi

69 Takip Edilen

Fred Zhang (@fredzhang0) 's Twitter Profile Photo

Beating prediction markets with chatbots sounds cool. In a recent work arxiv.org/abs/2402.18563, we get somewhat close to that. As another perspective, forecasting is a great capability domain to benchmark LM reasoning, calibration, pre-training knowledge, and more. 🧵1/n

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

Independent AI research should be valued and protected. In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward. sites.mit.edu/ai-safe-harbor/ 1/

Independent AI research should be valued and protected.

In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward.

sites.mit.edu/ai-safe-harbor/

1/
Frances Ding (@francesding) 's Twitter Profile Photo

Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode? In a new paper (w/ Jacob Steinhardt) we find that pLM likelihoods have a strong species bias! 1/

Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode?

In a new paper (w/ <a href="/JacobSteinhardt/">Jacob Steinhardt</a>) we find that pLM likelihoods have a strong species bias! 

1/
Danny Halawi (@dannyhalawi15) 's Twitter Profile Photo

Language models can imitate patterns in prompts. But this can lead them to reproduce inaccurate information if present in the context. Our work (arxiv.org/abs/2307.09476) shows that when given incorrect demonstrations for classification tasks, models first compute the correct

Language models can imitate patterns in prompts. But this can lead them to reproduce inaccurate information if present in the context.

Our work (arxiv.org/abs/2307.09476) shows that when given incorrect demonstrations for classification tasks, models first compute the correct
Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

Super excited to share that VisDiff has been accepted to #CVPR2024 and selected as an oral (90/11,532)! We will give a 15-min presentation going through the methods and exciting applications enabled by VisDiff. See you in Seattle!

Pravesh K. Kothari (@praveshkkothari) 's Twitter Profile Photo

In a new preprint with Jarek Blasiok, Rares Buhai, David Steurer, we show a surprisingly simple greedy algorithm that can list decode planted cliques in the semirandom model at k~sqrt n log^2 n --essentially optimal up to log^2 n. This ~resolves Jacob Steinhardt's open question.

David Bau (@davidbau) 's Twitter Profile Photo

I am delighted to officially announce the National Deep Inference Fabric project, #NDIF. ndif.us NDIF is an U.S. National Science Foundation-supported computational infrastructure project to help YOU advance the science of large-scale AI.

I am delighted to officially announce the National Deep Inference Fabric project, #NDIF.

ndif.us

NDIF is an <a href="/NSF/">U.S. National Science Foundation</a>-supported computational infrastructure project to help YOU advance the science of large-scale AI.
Yossi Gandelsman (@ygandelsman) 's Twitter Profile Photo

Mechanistic interpretability is not only a good way to understand what is going on in a model, but it is also a tool for discovering "model bugs" and exploiting them! Our new paper shows that understanding CLIP neurons enables automatic generation of semantic adversarial images:

Mechanistic interpretability is not only a good way to understand what is going on in a model, but it is also a tool for discovering "model bugs" and exploiting them!

Our new paper shows that understanding CLIP neurons enables automatic generation of semantic adversarial images:
Jiahai Feng (@feng_jiahai) 's Twitter Profile Photo

New preprint! We build on the hypothesis that language models construct latent world models of their inputs, and seek to extract latent world states as logical propositions using “propositional probes”.

New preprint! We build on the hypothesis that language models construct latent world models of their inputs, and seek to extract latent world states as logical propositions using “propositional probes”.
Danny Halawi (@dannyhalawi15) 's Twitter Profile Photo

New paper! We introduce Covert Malicious Finetuning (CMFT), a method for jailbreaking language models via fine-tuning that avoids detection. We use our method to covertly jailbreak GPT-4 via the OpenAI finetuning API.

New paper! We introduce Covert Malicious Finetuning (CMFT), a method for jailbreaking language models via fine-tuning that avoids detection. We use our method to covertly jailbreak GPT-4 via the OpenAI finetuning API.
Ethan Perez (@ethanjperez) 's Twitter Profile Photo

One of the most important and well-executed papers I've read in months. They explored ~all attacks+defenses I was most keen on seeing tried, for getting robust finetuning APIs. I'm not sure if it's possible to make finetuning APIs robust, would be a big deal if it were possible

MIT CSAIL (@mit_csail) 's Twitter Profile Photo

As AI models become more powerful, auditing them for safety & biases is crucial — but also challenging & labor-intensive. Can we automate and scale this process? MIT CSAIL researchers introduce "MAIA," which iteratively designs experiments to explain AI systems' behavior:

As AI models become more powerful, auditing them for safety &amp; biases is crucial — but also challenging &amp; labor-intensive. Can we automate and scale this process?

MIT CSAIL researchers introduce "MAIA," which iteratively designs experiments to explain AI systems' behavior: