Philip Vollet (@philipvollet) 's Twitter Profile
Philip Vollet

@philipvollet

Head of Developer Growth @weaviate_io & Open source lover tweeting about machine learning and data science projects.

ID: 421795636

linkhttps://www.linkedin.com/in/philipvollet calendar_today26-11-2011 11:40:03

17,17K Tweet

30,30K Takipçi

6,6K Takip Edilen

Weaviate • vector database (@weaviate_io) 's Twitter Profile Photo

Are you in Austin this week for Confluent Current? Team Weaviate is delivering two talks in the expo hall, where we will showcase a demo app called Wealingo. It provides real-time, personalized language learning using Weaviate's vector database and Confluent Cloud's

Leonie (@helloiamleonie) 's Twitter Profile Photo

Different types of embeddings: Sparse: [0, 3, 0, 1, …, 12, 0, 0] Dense: [0.34, -3.75, -0.93, …, 1.53, 0.95] Dense multi-vector (e.g., ColBERT): [[0.01, …, -0.03], … [-0.91, …, 0.23]] Dense with variable dimensions (e.g., Matryoshka): 8 dimensions: [-0.03, -0.42,

Victoria Slocum (@victorialslocum) 's Twitter Profile Photo

Optimizing your chunking techniques is one of the top places to improve performance in your RAG pipelines, but what’s the best one? Jina AI just released a new method called late chunking that takes the same amount of storage space as naive chunking, but solves the problem of

Google AI (@googleai) 's Twitter Profile Photo

Introducing our new whale bioacoustics model, which can identify eight distinct species, including multiple calls for two of those species. The model even includes the “Biotwang” sounds recently attributed to the Bryde’s whale. Learn more at: goo.gle/3Znukdk

Introducing our new whale bioacoustics model, which can identify eight distinct species, including multiple calls for two of those species. The model even includes the “Biotwang” sounds recently attributed to the Bryde’s whale. Learn more at: goo.gle/3Znukdk
Weaviate • vector database (@weaviate_io) 's Twitter Profile Photo

Late chunking is revolutionizing the way Retrieval Augmented Generation (RAG) systems retrieve information. 🧩 In naive chunking: 1. We separate the original document into chunks (e.g. sentences) 2. Each chunk is independently embedded into token-level representations 3. These

Jennifer Li (@jenniferhli) 's Twitter Profile Photo

I’ve said this before and I’ll say it again. The #1, #2, #3 deciding factor of a startup’s success is the shipping velocity. Companies have no long term moat, the only moat is a fast shipping culture.

daniel phiri (@malgamves) 's Twitter Profile Photo

incredibly excited to share that i will be speaking at the flagship ai event by dotConferences in paris next month. check it out! 👨🏾‍🌾 dotai.io

incredibly excited to share that i will be speaking at the flagship ai event by <a href="/dotConferences/">dotConferences</a>  in paris next month.

check it out! 👨🏾‍🌾 dotai.io
Weaviate • vector database (@weaviate_io) 's Twitter Profile Photo

Did you know that 43% of users on retail websites go directly to the search bar and are 2-3x more likely to convert? Yet, 64% of retail website managers have no clear plan on how to improve their search experience. Find out how to make the most of AI search and nail your

Did you know that 43% of users on retail websites go directly to the search bar and are 2-3x more likely to convert?

Yet, 64% of retail website managers have no clear plan on how to improve their search experience.

Find out how to make the most of AI search and nail your
Mark Riedl (@mark_riedl) 's Twitter Profile Photo

Making Large Language Models into World Models with Precondition and Effect Knowledge arxiv.org/abs/2409.12278 I'm an RL guy. To me, a world model maps (state, action) -> state' So let's make LLMs do that. Then we can build planning and reasoning algorithms on top of them.

Making Large Language Models into World Models with Precondition and Effect Knowledge

arxiv.org/abs/2409.12278

I'm an RL guy. To me, a world model maps (state, action) -&gt; state'

So let's make LLMs do that. Then we can build planning and reasoning algorithms on top of them.
abhinav (@abnux) 's Twitter Profile Photo

probably the most important essay you need to read as a designer, creator, founder for the upcoming era “in a world of scarcity, we treasure tools. in a world of abundance, we treasure taste.”

probably the most important essay you need to read as a designer, creator, founder for the upcoming era

“in a world of scarcity, we treasure tools.
in a world of abundance, we treasure taste.”
Weaviate • vector database (@weaviate_io) 's Twitter Profile Photo

How do you get the most out of your Retrieval-Augmented Generation (RAG) apps? First crucial step is chunking! The process of breaking down large documents or texts into smaller, manageable pieces called ‘chunks’. This simple yet powerful pre-processing step is key to boosting

How do you get the most out of your Retrieval-Augmented Generation (RAG) apps?

First crucial step is chunking! 

The process of breaking down large documents or texts into smaller, manageable pieces called ‘chunks’. This simple yet powerful pre-processing step is key to boosting
Weaviate • vector database (@weaviate_io) 's Twitter Profile Photo

3. Document-Based Chunking: This technique creates chunks based on the natural divisions within the document, such as headings or sections. It’s very effective for structured data like HTML, Markdown, or code files but it’s less useful when the data lacks clear structural

3. Document-Based Chunking:
This technique creates chunks based on the natural divisions within the document, such as headings or sections. It’s very effective for structured data like HTML, Markdown, or code files but it’s less useful when the data lacks clear structural
Weaviate • vector database (@weaviate_io) 's Twitter Profile Photo

4. Semantic Chunking In this technique, the text is divided into meaningful units, such as sentences or paragraphs, which are then vectorized. These units are then combined into chunks based on the cosine distance between their embeddings, with a new chunk formed whenever a

4. Semantic Chunking
In this technique, the text is divided into meaningful units, such as sentences or paragraphs, which are then vectorized. These units are then combined into chunks based on the cosine distance between their embeddings, with a new chunk formed whenever a
meng shao (@shao__meng) 's Twitter Profile Photo

RAG 中你需要知道的 5 种分块技术 Weaviate • vector database 的文章强调了分块在 RAG 应用中的重要性。对于提高 LLM 性能至关重要,能使 RAG 应用更智能、更快速、更高效。 文中介绍了五种主要的分块技术: 01 - 固定大小分块: - 方法:将文本分割成固定大小的块,不考虑内容的自然断点或结构。 -

RAG 中你需要知道的 5 种分块技术

<a href="/weaviate_io/">Weaviate • vector database</a> 的文章强调了分块在 RAG 应用中的重要性。对于提高 LLM 性能至关重要,能使 RAG 应用更智能、更快速、更高效。

文中介绍了五种主要的分块技术:

01 - 固定大小分块:
- 方法:将文本分割成固定大小的块,不考虑内容的自然断点或结构。
-
Weaviate • vector database (@weaviate_io) 's Twitter Profile Photo

We introduced the WeaviateAsyncClient for Python! • Easy integration with local environments, Weaviate Cloud, and custom setups • Works seamlessly with FastAPI for modular web API microservices Read more about the release: weaviate.io/blog/weaviate-…