Stanislas Polu(@spolu) 's Twitter Profileg
Stanislas Polu

@spolu

_co-founder+engineer(https://t.co/fCirsLjeo2), _alumni(https://t.co/8jAnpFAkp1, https://t.co/e99AaHzlA0, https://t.co/4jg6knqi2S, https://t.co/kXE6PNf8xH)

ID:10580512

linkhttps://spolu.now.sh calendar_today26-11-2007 02:35:35

8,6K Tweets

13,9K Followers

607 Following

Stanislas Polu(@spolu) 's Twitter Profile Photo

Clear negative correlation between accuracy and reasoning gap. This goes directly against the hypothesis that larger models are more contaminated.

Best news for largest language models in a long time!

WTF is going on with Mistral Large 5 shots without CoT?

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

Next week I'll run a model on all the conversations of the week to estimate (usefulness, time saved or lost in minutes) so that I can compute # of humans saved / week by Dust users :)

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

Semantic search is powerful but bad at quantitative questions (by construction).

To circumvent that, we built Table Queries📓

Any structured data in your company (Google Sheets, Notion DBs, CSVs...) gets turned in to JIT in-memory sqlite DBs that models can query using SQL👨‍🏫

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

We made two hard bets with Dust:

- An horizontal platform with access to all the SaaS relied on by our users (Notion, Github, Slack, Drive, Intercom, ...)
- Not one Assistant, but many Assistants specialized on specific tasks.
- Capability to do semantic rertieval but also…

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

Anybody tried to make models play chess against one another in standard algebraic notation?

We know models are quite good at it. But who wins?

Mistral-Large vs Claude 2 vs Gemini 1.5 vs GPT-4

account_circle