Samuel Albanie (@samuelalbanie) 's Twitter Profile
Samuel Albanie

@samuelalbanie

Researcher @GoogleDeepMind

ID: 250086323

linkhttps://www.samuelalbanie.com calendar_today10-02-2011 11:46:55

494 Tweet

3,3K Followers

503 Following

Samuel Albanie (@samuelalbanie) 's Twitter Profile Photo

Today I'll give my final lecture on data structures & algorithms Engineering Dept Cambridge University 😢 But, for those keen to study: - re-recorded videos - slides - and code are all available online: buff.ly/3UVX36V (the fun Red-Black Tree vis. is based on work by Luca 🇪🇺 🇮🇹)

Samuel Albanie (@samuelalbanie) 's Twitter Profile Photo

Beartype has long been one of my favourite open-source libraries Because: - it's a great library - thanks to maintainer Cecil Curry (leycec) every GitHub issue thread is a work of literature Some classics buff.ly/4bWCpJG buff.ly/3TgFSvA buff.ly/3wA4N4j

Beartype has long been one of my favourite open-source libraries

Because:
- it's a great library
- thanks to maintainer Cecil Curry (leycec) every GitHub issue thread is a work of literature

Some classics
buff.ly/4bWCpJG
buff.ly/3TgFSvA
buff.ly/3wA4N4j
Samuel Albanie (@samuelalbanie) 's Twitter Profile Photo

A small personal update: - Excited to join Google DeepMind 🚀 - Grateful for the wonderful humans I've had the pleasure of working with on my journey so far at Engineering Dept and Visual Geometry Group (VGG) ❤️

Jonathan Roberts (@jrobertsai) 's Twitter Profile Photo

Introducing SciFIBench, a scientific figure interpretation benchmark for LMMs! github.com/jonathan-rober… - We evaluate 30 LMM, VLM and human baselines - GPT-4o is much better than GPT-4V - The mean human narrowly outperforms GPT-4o & Gemini-Pro 1.5 (1/5)

Introducing SciFIBench, a scientific figure interpretation benchmark for LMMs! 
github.com/jonathan-rober…

- We evaluate 30 LMM, VLM and human baselines
- GPT-4o is much better than GPT-4V
- The mean human narrowly outperforms GPT-4o & Gemini-Pro 1.5

(1/5)
Tim Franzmeyer (@frtimlive) 's Twitter Profile Photo

📢 Introducing HelloFresh: A Dynamic LLM Benchmark of Real-World Human Editorial Actions on X Community Notes and Wikipedia Edits. Can you beat GPT4 and GeminiPro at classifying X Community Notes and Wikipedia edits? Try our demo – shown in the video below – and see what

Zac Kenton (@zackenton1) 's Twitter Profile Photo

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now? We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself. Does this work? It’s complicated: 🧵👇

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now?

We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself.

Does this work? It’s complicated: 🧵👇
Samuel Albanie (@samuelalbanie) 's Twitter Profile Photo

Enjoyed this paper on LMs, world models and agent models by Zhiting Hu and Tianmin Shu TLDR: for reasoning tasks, it’s a useful abstraction to treat LMs as simulators (“backends”) that simulate agent models and world models arxiv.org/abs/2312.05230

Enjoyed this paper on LMs, world models and agent models by <a href="/ZhitingHu/">Zhiting Hu</a> and <a href="/tianminshu/">Tianmin Shu</a>

TLDR: for reasoning tasks, it’s a useful abstraction to treat LMs as simulators (“backends”) that simulate agent models and world models

arxiv.org/abs/2312.05230
Usman Anwar (@usmananwar391) 's Twitter Profile Photo

Our agenda paper on alignment and safety of LLMs just got published at TMLR: openreview.net/forum?id=oVTkO… 🥳 The revised version is also now on arxiv arxiv.org/abs/2404.09932.