Davis Blalock (@davisblalock) Twitter Tweets • TwiCopy

Davis Blalock

@davisblalock

+ Follow

Research scientist @MosaicML. @MIT PhD. I retweet high-quality threads about machine learning papers. Weekly paper summaries newsletter: https://t.co/xX7NIpsIVZ

ID:805547773944889344

calendar_today04-12-2016 23:02:10

1,2K Tweets

11,7K Followers

158 Following

Aleksa Gordić 🍿🤖

@gordic_aleksa

10 months ago

Here's the much anticipated talk with Davis Blalock walking us through the story behind MosaicML - one of the fastest growing unicorns in the world that got acquired for an eyewatering 1.3B$.

YT: youtu.be/aLh5DxGl4iI

Davis shares the story, the progress, the technical…

thumb_up_off_alt49

chat_bubble_outline0

repeat9

shareShare

account_circle

Davis Blalock

@davisblalock

10 months ago

Suppose a bunch of people at the start of the industrial revolution set out to ensure this new technology benefited humanity.

What could they have done that would actually make a difference?

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

account_circle

Tom Lieberum 🔎

@lieberum_t

10 months ago

Mech interp has been very successful in tiny models, but does it scale? …Kinda!

Our new Google DeepMind paper studies how Chinchilla70B can do multiple-choice Qs, focusing on picking the correct letter. Small model techniques mostly work but it's messy!🧵arxiv.org/abs/2307.09458

account_circle

MosaicML

@MosaicML

10 months ago

📢 Today, we're thrilled to announce that @Databricks has completed its acquisition of MosaicML. Our teams share a common goal to make #GenerativeAI accessible for all. We're excited to change the world together!

Read the press release and stay tuned for more updates:…

account_circle

Mansheej Paul

@mansiege

10 months ago

Can in-context learning learn new tasks different from those in the pretraining data? Is this an emergent ability, i.e. does it arise from pretraining without being explicitly optimized for? How does this depend on pretraining task diversity? 🧵 1/
arxiv.org/abs/2306.15063

account_circle

Dylan Patel

@dylan522p

10 months ago

Demystifying GPT-4: The engineering tradeoffs that led OpenAI to their architecture.
GPT-4 model architecture, training infrastructure, inference infrastructure, parameter count, training dataset composition, token count, layer count, parallelism, vision
semianalysis.com/p/gpt-4-archit…

account_circle

Ishan Khatri

@i_ikhatri

10 months ago

🧵What is scene flow & why should you care?

Scene flow is the 3D extension of optical flow. For each point in space, we want to provide a motion vector for that describes the motion of the point/object between t and t+1. 1/n

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

account_circle

Davis Blalock

@davisblalock

10 months ago

4 points stood out to me:
1) Models only learn to add the number of digits they saw during training, not how to do addition in general.

2) Models are way better at adding if you let them output lower digits first instead of higher ones; this lets them compute the answer one…

account_circle

Davis Blalock

@davisblalock

10 months ago

Cool that they found clear, interpretable differences between CNN and ViT behavior.

ViTs are better at ignoring irrelevant patches by default. But if you *train* with irrelevant patches, the gap mostly closes.
Points to CNN vs ViT inductive biases working the way you'd expect…

thumb_up_off_alt55

chat_bubble_outline0

repeat3

shareShare

account_circle

Gowthami Somepalli

@gowthami_s

11 months ago

📃🚨 Does your diffusion model copy from the training data? How to find such behavior? Why does it happen? Can we somehow mitigate it?

A summary of recent work on understanding training data replication in recent T2I #diffusion models. A long 🧶

#machinelearning #aigeneration

account_circle