Joey Gonzalez (@profjoeyg) Twitter Tweets • TwiDoom

Joschka Braun

6 months ago

I benchmarked Anthropic's new tool use beta API on the Berkeley function calling benchmark. Haiku beats GPT-4 Turbo in half of the scenarios. Results in 🧵 A huge thanks to Shishir Patil, Fanjia Yan, Tianjun Zhang, Joey Gonzalez & rest for providing this benchmark publicly.

I benchmarked <a href="/AnthropicAI/">Anthropic</a>'s new tool use beta API on the Berkeley function calling benchmark. Haiku beats GPT-4 Turbo in half of the scenarios. Results in 🧵

A huge thanks to <a href="/shishirpatil_/">Shishir Patil</a>, <a href="/fanjia_yan/">Fanjia Yan</a>, <a href="/tianjun_zhang/">Tianjun Zhang</a>, <a href="/profjoeyg/">Joey Gonzalez</a> & rest for providing this benchmark publicly.

thumb_up_off_alt182

chat_bubble_outline10

repeat37

shareShare

lmsys.org

@lmsysorg

3 months ago

Congrats NVIDIA on the exciting 340B model release! The model was tested under the codename "june-chatbot" and is now coming out of stealth with impressive performance, surpassing Llama-3-70b across hard benchmarks like Arena-Hard-Auto. The new best open model? Come play with

Congrats <a href="/nvidia/">NVIDIA</a> on the exciting 340B model release!

The model was tested under the codename "june-chatbot" and is now coming out of stealth with impressive performance, surpassing Llama-3-70b across hard benchmarks like Arena-Hard-Auto.

The new best open model? Come play with

thumb_up_off_alt540

chat_bubble_outline10

repeat101

shareShare

Simon Willison

@simonw

3 months ago

Here's the animated LMSYS arena tool I built for the talk using Claude 3.5 Sonnet and Artifacts - inspired by Peter Gostev's visualizations Peter's: linkedin.com/posts/peter-go… My tool: tools.simonwillison.net/arena-animated

thumb_up_off_alt61

chat_bubble_outline4

repeat8

shareShare

Vikram Sreekanti

@vsreekanti

3 months ago

.RunLLM's now on the Modin Project documentation site — thanks to the Modin team for their help! Let us know what you think: modin.readthedocs.io/en/stable/

thumb_up_off_alt9

chat_bubble_outline1

repeat5

shareShare

Vikram Sreekanti

@vsreekanti

3 months ago

Super fun to have the RunLLM team together for the last couple days! We might need a harder escape room next time 😎

Super fun to have the <a href="/RunLLM/">RunLLM</a> team together for the last couple days! We might need a harder escape room next time 😎

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Matei Zaharia

@matei_zaharia

3 months ago

Great post by Celebal Technologies on implementing RAFT on top of Databricks fine-tuning to out-perform RAG: celebaltech.com/blogs/enhancin… Incidentally the RAFT paper will be presented at Conference on Language Modeling so check it out if you're there.

thumb_up_off_alt51

chat_bubble_outline3

repeat13

shareShare

Joey Gonzalez

@profjoeyg

2 months ago

Now this is fun research!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Lianmin Zheng

@lm_zheng

2 months ago

Torch.compile is awesome! No more headaches from hacking tricky low-level CUDA code for new attention variants. I can't wait to see them catch up with H100 and FP8 support as well.

thumb_up_off_alt74

chat_bubble_outline2

repeat5

shareShare

vLLM

@vllm_project

2 months ago

🙏 Thank you NVIDIA for sponsoring vLLM development. The DGX H200 machine is marvelous! We plan to use the machine for benchmarking and performance enhancement 🏎️.

🙏 Thank you <a href="/nvidia/">NVIDIA</a> for sponsoring vLLM development. The DGX H200 machine is marvelous! We plan to use the machine for benchmarking and performance enhancement 🏎️.

thumb_up_off_alt440

chat_bubble_outline15

repeat25

shareShare

Joey Gonzalez

@profjoeyg

2 months ago

For the past year, Vikram Sreekanti and I have been writing about whatever we found interesting in our generative AI themed blog - Generating Conversations. However, when we look at which posts did the best, it is clear that our readers are most interested in deeper discussions on

thumb_up_off_alt5

chat_bubble_outline0

repeat4

shareShare

Vikram Sreekanti

@vsreekanti

a month ago

We've been thinking a lot about how to price RunLLM recently — pricing is always hard, but AI adds some interesting wrinkles to the dynamic. We haven't completely figured it out yet, but we thought we'd share our thoughts about where things are headed: open.substack.com/pub/frontierai…

thumb_up_off_alt5

chat_bubble_outline0

repeat4

shareShare

Joey Gonzalez

@profjoeyg

a month ago

What is the right pricing model for AI? Should it be a monthly fee or a flat rate per token? Do you pay extra for more knowledge? Three years ago, I was focused on server-less computing for AI and how to allocate inference engines. At the time, consumption based pricing was

thumb_up_off_alt1

chat_bubble_outline1

repeat2

shareShare

Liana

@lianapatel_

a month ago

Want to answer NL questions over your data? Introducing Table Augmented Generation (TAG)! Joint work w/ the amazing Matei Zaharia Carlos Guestrin Joey Gonzalez Asim Biswal Sid Jha Amog Kamsetty Shu Liu 📚 Paper: arxiv.org/abs/2408.14717 🛠️ Code: github.com/tag-research/t… 🧵

Want to answer NL questions over your data?

Introducing Table Augmented Generation (TAG)!

Joint work w/ the amazing <a href="/matei_zaharia/">Matei Zaharia</a> <a href="/guestrin/">Carlos Guestrin</a> <a href="/profjoeyg/">Joey Gonzalez</a> <a href="/_asimbiswal/">Asim Biswal</a> <a href="/sid_jha1/">Sid Jha</a> <a href="/AmogKamsetty/">Amog Kamsetty</a> <a href="/LynnLiu41887950/">Shu Liu</a>

📚 Paper: arxiv.org/abs/2408.14717
🛠️ Code: github.com/tag-research/t…

🧵

thumb_up_off_alt197

chat_bubble_outline7

repeat37

shareShare

Joey Gonzalez

@profjoeyg

17 days ago

Back in 2023, my students working on Gorilla project made the case for connecting LLMs to SaaS APIs. Today, everyone knows that models should be interacting with APIs. What people don't realize is that LLMs need to know more than how to call the API. They need to learn the

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Joey Gonzalez

@profjoeyg

2 days ago

I am excited to announce the launch of the Video Arena. Our goal is to study video generation models and ultimately how humans prompt them. You can help us by watching entertaining generative AI videos constructed using the same prompt.

thumb_up_off_alt13

chat_bubble_outline1

repeat2

shareShare

Joey Gonzalez

@profjoeyg

17 hours ago

We finally have a benchmark that evaluates stateful multi-step and multi-turn function calling!

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare