Casper Hansen (@casper_hansen_) 's Twitter Profile
Casper Hansen

@casper_hansen_

NLP Scientist | AutoAWQ Creator | Open-Source Contributor

ID: 1163801450968997889

linkhttps://github.com/casper-hansen calendar_today20-08-2019 13:14:30

1,1K Tweet

2,2K Takipçi

210 Takip Edilen

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

This release of SWE-bench Verified feels to me like OpenAI will launch a new model that is highly capable in coding & math, a competitor to Sonnet 3.5. Only time can tell if 🍓/ Q* + larger model works like we think it will.

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

Which are the best open SOTA datasets for finetuning and DPO that is competitive with instruct models from Meta and Mistral?

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

Imagine if Antrophic releases 3.5 Opus as the original 3.5 Sonnet and the new Sonnet 3.5 is just a quantized and aligned version of the original

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

lmsys continues to innovate in the hard field of measuring natural language in a fair manner. still, is the benchmark saturated or do we need GPT-5 / Claude 4 before we see a noticeable difference between the current gen and the next gen we want

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

It’s amazing to me that so many companies have been built on the Llama model series Llama also helps bring competition to OpenAI, Google, etc. and helps bring down prices and overall improves the ecosystem Llama is just a rerun of the Linux strategy

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

I wonder how they split this amongst the employees? I would like to see how they scale to such a large cluster, utilizing the whole thing efficiently

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

The more examples I see from Grok 2 and how it aces things, the more I am impressed because it was built extremely fast. xAI team really nailed this

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

If the $2k/month subscription from OpenAI is true, that means they generate 100x more tokens per token in the output. Imagine this on a 3T param model🤯 That would be similar to scaling Quiet-STaR with 96 thoughts and 48 ahead tokens. Looks like performance could keep improving.

If the $2k/month subscription from OpenAI is true, that means they generate 100x more tokens per token in the output. Imagine this on a 3T param model🤯

That would be similar to scaling Quiet-STaR with 96 thoughts and 48 ahead tokens. Looks like performance could keep improving.
Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

Let's call this strawberry-mini. The method is not actively generating extra tokens, but it does seem to start by generating a rationale between <thinking> tags. Could be a useful technique!

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

An open invitation to the cracked Triton optimization engineers: Go optimize the new AWQ Triton kernel so that are fellow AMD users can benefit massively. Tricks to optimize can be found in this post: pytorch.org/blog/accelerat…

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

I'm quantizing mattshumer/Reflection-Llama-3.1-70B with the AWQ algorithm. Would be great if Matt Shumer can release the instruction dataset so we can use a better calibration dataset

Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

AWQ Reflection Llama 3.1 70B released! I have not evaluated it but assume it works out of the box huggingface.co/casperhansen/r…