Andrew Carr (e/🤸) (@andrew_n_carr) Twitter Tweets • TwiDoom

Andrew Carr (e/🤸)

@andrew_n_carr

+ Follow

co-founder leading science @getcartwheel AI writer @tldrnewsletter advisor @arcade_ai Past - Codegen @OpenAI, Brain @GoogleAI, world ranked Tetris player

ID: 3378986176

linkhttps://getcartwheel.com calendar_today16-07-2015 15:36:33

6,6K Tweet

16,16K Followers

3,3K Following

Sander Dieleman

17 days ago

Diffusion is the rising tide that eventually submerges all frequencies, high and low 🌊 Diffusion is the gradual decomposition into feature scales, fine and coarse 🗼 Diffusion is just spectral autoregression 🤷🌈

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat158

shareShare

Andrew Carr (e/🤸)

16 days ago

everyone working at the forefront of model training should internalize what this graph means for long term cluster scaling.

everyone working at the forefront of model training should internalize what this graph means for long term cluster scaling.

thumb_up_off_alt259

chat_bubble_outline18

repeat31

shareShare

Andrew Carr (e/🤸)

16 days ago

Here's a quick outline of the learning rate schedule for Llama 3.1 - this is likely the simplest and most powerful "shape" for a schedule when training models.

Here's a quick outline of the learning rate schedule for Llama 3.1 - this is likely the simplest and most powerful "shape" for a schedule when training models.

thumb_up_off_alt10

chat_bubble_outline2

repeat0

shareShare

Andrew Carr (e/🤸)

16 days ago

🤸🤸🤸

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Riley Goodside

15 days ago

Weirdly this is an easier problem

Weirdly this is an easier problem

thumb_up_off_alt606

chat_bubble_outline19

repeat21

shareShare

Andrew Carr (e/🤸)

15 days ago

Frantically refreshing Google's repos looking for `hidden_BLD`

Frantically refreshing Google's repos looking for `hidden_BLD`

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Andrew Carr (e/🤸)

15 days ago

"It's not that bad" is the best compliment I can imagine for our tool.

"It's not that bad" is the best compliment I can imagine for our tool.

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Andrew Carr (e/🤸)

15 days ago

It's the first model that isn't just a funny little guy

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Andrew Carr (e/🤸)

15 days ago

hey look! Cartwheel is in times square. send me a picture if you happen to see it!!

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Andrew Carr (e/🤸)

14 days ago

Scale your layers to get transfer across model sizes

Scale your layers to get transfer across model sizes

thumb_up_off_alt7

chat_bubble_outline2

repeat0

shareShare

Andrew Carr (e/🤸)

14 days ago

chat, what's the best way to process (using phi 3.5) 4 Trillion tokens?

thumb_up_off_alt3

chat_bubble_outline3

repeat0

shareShare

Andrew Carr (e/🤸)

14 days ago

There are really three classes of language models: 1. frontier reasoning models for solving coding problems 2. reliable specific models for product deployment 3. good enough and fast models for batch processing

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Andrew Carr (e/🤸)

14 days ago

sounds like a normal PhD experience to me

sounds like a normal PhD experience to me

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Andrew Carr (e/🤸)

14 days ago

When training the SAEs for GemmaScope, the team had to store 17 Petabytes of activations. They found though, that if they used them within 100 days, it was cheaper than recomputing.

When training the SAEs for GemmaScope, the team had to store 17 Petabytes of activations. They found though, that if they used them within 100 days, it was cheaper than recomputing.

thumb_up_off_alt88

chat_bubble_outline4

repeat7

shareShare

Andrew Carr (e/🤸)

13 days ago

Deep learning is frustrating but still pretty neat

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Andrew Carr (e/🤸)

13 days ago

first excellent singing -> face model I've seen

thumb_up_off_alt37

chat_bubble_outline8

repeat1

shareShare

Andrew Carr (e/🤸)

13 days ago

chrome's tab grouping feature is actually so good

chrome's tab grouping feature is actually so good

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Andrew Carr (e/🤸)

13 days ago

The irony is that the 80 hr / week hustle bros are winning right now cause Claude makes each marginal hour way more productive. Essentially mitigating tiredness.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Andrew Carr (e/🤸)

13 days ago

Find Joakim from Cartwheel at the PyTorch conference, say hi!!

Find Joakim from <a href="/getcartwheel/">Cartwheel</a> at the <a href="/PyTorch/">PyTorch</a> conference, say hi!!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Andrew Carr (e/🤸)

13 days ago

The only benchmark I trust - Meta's legal team

The only benchmark I trust - Meta's legal team

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare