Davis Blalock(@davisblalock) 's Twitter Profileg
Davis Blalock

@davisblalock

Research scientist @MosaicML. @MIT PhD. I retweet high-quality threads about machine learning papers. Weekly paper summaries newsletter: https://t.co/xX7NIpsIVZ

ID:805547773944889344

calendar_today04-12-2016 23:02:10

1,2K Tweets

11,7K Followers

158 Following

Aleksa Gordić 🍿🤖(@gordic_aleksa) 's Twitter Profile Photo

Here's the much anticipated talk with Davis Blalock walking us through the story behind MosaicML - one of the fastest growing unicorns in the world that got acquired for an eyewatering 1.3B$.

YT: youtu.be/aLh5DxGl4iI

Davis shares the story, the progress, the technical…

Here's the much anticipated talk with @davisblalock walking us through the story behind @MosaicML - one of the fastest growing unicorns in the world that got acquired for an eyewatering 1.3B$. YT: youtu.be/aLh5DxGl4iI Davis shares the story, the progress, the technical…
account_circle
Davis Blalock(@davisblalock) 's Twitter Profile Photo

Suppose a bunch of people at the start of the industrial revolution set out to ensure this new technology benefited humanity.

What could they have done that would actually make a difference?

Suppose a bunch of people at the start of the industrial revolution set out to ensure this new technology benefited humanity. What could they have done that would actually make a difference?
account_circle
Tom Lieberum 🔎(@lieberum_t) 's Twitter Profile Photo

Mech interp has been very successful in tiny models, but does it scale? …Kinda!

Our new Google DeepMind paper studies how Chinchilla70B can do multiple-choice Qs, focusing on picking the correct letter. Small model techniques mostly work but it's messy!🧵arxiv.org/abs/2307.09458

Mech interp has been very successful in tiny models, but does it scale? …Kinda! Our new @GoogleDeepMind paper studies how Chinchilla70B can do multiple-choice Qs, focusing on picking the correct letter. Small model techniques mostly work but it's messy!🧵arxiv.org/abs/2307.09458
account_circle
MosaicML(@MosaicML) 's Twitter Profile Photo

📢 Today, we're thrilled to announce that @Databricks has completed its acquisition of MosaicML. Our teams share a common goal to make accessible for all.  We're excited to change the world together!

Read the press release and stay tuned for more updates:…

account_circle
Mansheej Paul(@mansiege) 's Twitter Profile Photo

Can in-context learning learn new tasks different from those in the pretraining data? Is this an emergent ability, i.e. does it arise from pretraining without being explicitly optimized for? How does this depend on pretraining task diversity? 🧵 1/
arxiv.org/abs/2306.15063

account_circle
Dylan Patel(@dylan522p) 's Twitter Profile Photo

Demystifying GPT-4: The engineering tradeoffs that led OpenAI to their architecture.
GPT-4 model architecture, training infrastructure, inference infrastructure, parameter count, training dataset composition, token count, layer count, parallelism, vision
semianalysis.com/p/gpt-4-archit…

account_circle
Ishan Khatri(@i_ikhatri) 's Twitter Profile Photo

🧵What is scene flow & why should you care?

Scene flow is the 3D extension of optical flow. For each point in space, we want to provide a motion vector for that describes the motion of the point/object between t and t+1. 1/n

account_circle
Davis Blalock(@davisblalock) 's Twitter Profile Photo

4 points stood out to me:
1) Models only learn to add the number of digits they saw during training, not how to do addition in general.

2) Models are way better at adding if you let them output lower digits first instead of higher ones; this lets them compute the answer one…

account_circle
Davis Blalock(@davisblalock) 's Twitter Profile Photo

Cool that they found clear, interpretable differences between CNN and ViT behavior.

ViTs are better at ignoring irrelevant patches by default. But if you *train* with irrelevant patches, the gap mostly closes.
Points to CNN vs ViT inductive biases working the way you'd expect…

account_circle
Gowthami Somepalli(@gowthami_s) 's Twitter Profile Photo

📃🚨 Does your diffusion model copy from the training data? How to find such behavior? Why does it happen? Can we somehow mitigate it?

A summary of recent work on understanding training data replication in recent T2I models. A long 🧶

📃🚨 Does your diffusion model copy from the training data? How to find such behavior? Why does it happen? Can we somehow mitigate it? A summary of recent work on understanding training data replication in recent T2I #diffusion models. A long 🧶 #machinelearning #aigeneration
account_circle