Owain Evans (@owainevans_uk) Twitter Tweets • TwiDoom

Owain Evans

@owainevans_uk

+ Follow

Independent AI Safety research group in Berkeley + Affiliate at UC Berkeley. Past: Oxford Uni, TruthfulQA, Reversal Curse. Prefer email to DM.

ID: 1247872005912891392

linkhttps://owainevans.github.io/ calendar_today08-04-2020 13:01:26

4,4K Tweet

7,7K Followers

266 Following

Jacob Pfau

@jacob_pfau

3 months ago

Situational awareness benchmarking shows increasing performance with newer LLMs, but not on this one: ANTI-IMITATION tasks challenge LLMs that naively imitates training distribution. To succeed, an LLM must use details of the LLM itself and its particular non-human capabilities.

thumb_up_off_alt23

chat_bubble_outline2

repeat3

shareShare

Owain Evans

@owainevans_uk

2 months ago

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Owain Evans

@owainevans_uk

2 months ago

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Owain Evans

@owainevans_uk

2 months ago

LLM-robo girlfriend

thumb_up_off_alt24

chat_bubble_outline1

repeat2

shareShare

Dima Krasheninnikov

@dmkrash

2 months ago

1/ Excited to finally tweet about our paper “Implicit meta-learning may lead LLMs to trust more reliable sources”, to appear at ICML 2024. Our results suggest that during training, LLMs better internalize text that appears useful for predicting other text (e.g. seems reliable).

thumb_up_off_alt274

chat_bubble_outline5

repeat47

shareShare

Owain Evans

@owainevans_uk

2 months ago

Great opening to a philosophy book review. "Barrister" is roughly "trial lawyer"

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Owain Evans

@owainevans_uk

2 months ago

"The Annunciation", Oleksandr Murashko. 1909, National Art Museum of Ukraine Saw this in Bratislava (Slovakia) on loan from Ukraine. Not much info online about this painting.

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Owain Evans

@owainevans_uk

2 months ago

Jan Leike on two AI alignment threat models

thumb_up_off_alt116

chat_bubble_outline3

repeat6

shareShare

Owain Evans

@owainevans_uk

2 months ago

I'm at #ICML2024 in Vienna this week.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Joe Carlsmith

@jkcarlsmith

a month ago

I so enjoyed talking with Dwarkesh Patel about my essay series “Otherness and control in the age of AGI.” He engaged so deeply, asked such great questions, and aimed so directly at the core of the issues at stake. It’s a pleasure to be a part of conversations like this.

thumb_up_off_alt155

chat_bubble_outline5

repeat12

shareShare

Michaël Trazzi

@michaeltrazzi

a month ago

I've interviewed Owain Evans about his work on Al situational awareness and out-of-context reasoning in LLMs Owain has been publishing some of the most surprising and important Alignment papers in the past year, and I'm proud to be his first longform podcast ever