Casper Hansen (@casper_hansen_) Twitter Tweets • TwiDoom

Casper Hansen

@casper_hansen_

+ Follow

NLP Scientist | AutoAWQ Creator | Open-Source Contributor

ID: 1163801450968997889

linkhttps://github.com/casper-hansen calendar_today20-08-2019 13:14:30

1,1K Tweet

2,2K Takipçi

210 Takip Edilen

Casper Hansen

@casper_hansen_

a month ago

This release of SWE-bench Verified feels to me like OpenAI will launch a new model that is highly capable in coding & math, a competitor to Sonnet 3.5. Only time can tell if 🍓/ Q* + larger model works like we think it will.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Casper Hansen

@casper_hansen_

a month ago

sus-column-r on the arena was Grok 2. Competitive with Claude and GPT! x.ai/blog/grok-2

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Casper Hansen

@casper_hansen_

a month ago

Which are the best open SOTA datasets for finetuning and DPO that is competitive with instruct models from Meta and Mistral?

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Casper Hansen

@casper_hansen_

a month ago

Real question: is OpenDevin or alternatives actually as good as the real Devin? Not just on the benchmark, but generally?

thumb_up_off_alt1

chat_bubble_outline3

repeat0

shareShare

Casper Hansen

@casper_hansen_

23 days ago

Imagine if Antrophic releases 3.5 Opus as the original 3.5 Sonnet and the new Sonnet 3.5 is just a quantized and aligned version of the original

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Casper Hansen

@casper_hansen_

23 days ago

lmsys continues to innovate in the hard field of measuring natural language in a fair manner. still, is the benchmark saturated or do we need GPT-5 / Claude 4 before we see a noticeable difference between the current gen and the next gen we want

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Casper Hansen

@casper_hansen_

23 days ago

It’s amazing to me that so many companies have been built on the Llama model series Llama also helps bring competition to OpenAI, Google, etc. and helps bring down prices and overall improves the ecosystem Llama is just a rerun of the Linux strategy

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Casper Hansen

@casper_hansen_

21 days ago

Best repositories for effective distillation from a larger to smaller model?

thumb_up_off_alt5

chat_bubble_outline3

repeat1

shareShare

Casper Hansen

@casper_hansen_

20 days ago

Sounds to me like we need better open weight coding models.

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Casper Hansen

@casper_hansen_

20 days ago

I wonder how they split this amongst the employees? I would like to see how they scale to such a large cluster, utilizing the whole thing efficiently

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Casper Hansen

@casper_hansen_

17 days ago

The more examples I see from Grok 2 and how it aces things, the more I am impressed because it was built extremely fast. xAI team really nailed this

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Casper Hansen

@casper_hansen_

17 days ago

If the $2k/month subscription from OpenAI is true, that means they generate 100x more tokens per token in the output. Imagine this on a 3T param model🤯 That would be similar to scaling Quiet-STaR with 96 thoughts and 48 ahead tokens. Looks like performance could keep improving.

thumb_up_off_alt22

chat_bubble_outline3

repeat2

shareShare

Casper Hansen

@casper_hansen_

17 days ago

Let's call this strawberry-mini. The method is not actively generating extra tokens, but it does seem to start by generating a rationale between <thinking> tags. Could be a useful technique!

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

Casper Hansen

@casper_hansen_

16 days ago

An open invitation to the cracked Triton optimization engineers: Go optimize the new AWQ Triton kernel so that are fellow AMD users can benefit massively. Tricks to optimize can be found in this post: pytorch.org/blog/accelerat…

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Casper Hansen

@casper_hansen_

16 days ago

I'm quantizing mattshumer/Reflection-Llama-3.1-70B with the AWQ algorithm. Would be great if Matt Shumer can release the instruction dataset so we can use a better calibration dataset

thumb_up_off_alt10

chat_bubble_outline1

repeat0

shareShare

Casper Hansen

@casper_hansen_

16 days ago

AWQ Reflection Llama 3.1 70B released! I have not evaluated it but assume it works out of the box huggingface.co/casperhansen/r…

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare