Mark Tenenholtz (@marktenenholtz) Twitter Tweets • TwiCopy

9 hours ago

But wait, ChatGPT is where I go so that I don't get told to RTFM

thumb_up_off_alt13

account_circle

When training LLMs, you should spend less time reading raw text and more time inspecting tokenizer outputs.

You'd be amazed how many problems creep into tokenization. For instance, the T5 tokenizer treats curly braces as <unk>.

If your model is bad at math, how is it tokenizing…

account_circle

Mark Tenenholtz

2 days ago

Unsloth is incredibly efficient with GPU utilization. GPU 0 is in an oven right now.

thumb_up_off_alt83

repeat2

account_circle

Mark Tenenholtz

5 days ago

Can we all chip in for a server with 50 TB RAM so we can train a LightGBM LLM? Asking for a friend

thumb_up_off_alt26

repeat1

account_circle

Mark Tenenholtz

1 week ago

I use Modal because it brings me joy. There isn't much more to it.

thumb_up_off_alt30

account_circle

Mark Tenenholtz

2 weeks ago

I think we all knew Llama 3 was going to be good, but this appears to be really good

thumb_up_off_alt100

repeat4

account_circle

Mark Tenenholtz

3 weeks ago

I’d love to see ColBERT at 22M params (Omar Khattab Benjamin Clavié 👀)

thumb_up_off_alt60

repeat2

account_circle

Mark Tenenholtz

4 weeks ago

Time-series foundational models aren't working very well because we're completely misusing the attention mechanism

thumb_up_off_alt163

repeat4

account_circle

Mark Tenenholtz

1 month ago

Most of our recent embedding search breakthroughs were probably discovered by Google 4 years ago

thumb_up_off_alt60

account_circle

Mark Tenenholtz

1 month ago

SMOTE should always be your first choice for tabular data preprocessing. Class imbalance is a solved problem!

account_circle

Mark Tenenholtz

1 month ago

The gap between small models and huge models is closing.

Gap between the best GPT4 and 3.5T is about 140.

Gap between Opus and Haiku is only 80!

thumb_up_off_alt19

repeat3

account_circle

Mark Tenenholtz

1 month ago

NNs are so hard to train because even in the presence of bugs, they often just work. Sure, they're limping with an undiagnosed injury, but your QKV multiplication that's actually a QQV multiplication somehow works kinda?

thumb_up_off_alt50

repeat1

account_circle

Mark Tenenholtz

1 month ago

I've found the wrong way to use LLMs is to solve your most difficult tasks, the ones where you have expert level knowledge.

The right way is to use them to avoid wasting time on repetitive tasks so that you can focus all your brainpower on those most difficult tasks.

I don't…

account_circle

Mark Tenenholtz

1 month ago

What in the...

thumb_up_off_alt8