Mark Tenenholtz(@marktenenholtz) 's Twitter Profileg
Mark Tenenholtz

@marktenenholtz

Head of AI @PredeloHQ. XGBoost peddler, transformer purveyor.

ID:1186043355710545920

linkhttps://bio.link/mtenenholtz calendar_today20-10-2019 22:18:38

6,3K Tweets

114,7K Followers

549 Following

Follow People
Mark Tenenholtz(@marktenenholtz) 's Twitter Profile Photo

When training LLMs, you should spend less time reading raw text and more time inspecting tokenizer outputs.

You'd be amazed how many problems creep into tokenization. For instance, the T5 tokenizer treats curly braces as <unk>.

If your model is bad at math, how is it tokenizing…

account_circle
Mark Tenenholtz(@marktenenholtz) 's Twitter Profile Photo

Time-series foundational models aren't working very well because we're completely misusing the attention mechanism

account_circle
Mark Tenenholtz(@marktenenholtz) 's Twitter Profile Photo

The gap between small models and huge models is closing.

Gap between the best GPT4 and 3.5T is about 140.

Gap between Opus and Haiku is only 80!

account_circle
Mark Tenenholtz(@marktenenholtz) 's Twitter Profile Photo

NNs are so hard to train because even in the presence of bugs, they often just work. Sure, they're limping with an undiagnosed injury, but your QKV multiplication that's actually a QQV multiplication somehow works kinda?

account_circle
Mark Tenenholtz(@marktenenholtz) 's Twitter Profile Photo

I've found the wrong way to use LLMs is to solve your most difficult tasks, the ones where you have expert level knowledge.

The right way is to use them to avoid wasting time on repetitive tasks so that you can focus all your brainpower on those most difficult tasks.

I don't…

account_circle