Andrew Lamb
@andrewlamb1111
Apache {DataFusion, Arrow} PMC, Database Engineer
ID: 1326266114805002241
http://andrew.nerdnetworks.org/ 10-11-2020 20:51:18
404 Tweet
2,2K Followers
51 Following
Great proposal (technically and consenus building) for improving type handling in ApacheDataFusion by github.com/notfilippo. github.com/apache/datafus…. Datadog, Inc. is luck to have him.
Seminar Schedule: Sep 23: ApacheDataFusion Sep 30: Andy Grove→DataFusion Comet Oct 07: @ParadeDB Oct 21: Voltron Data→Theseus Oct 28: WHERE TRUE Technologies→Exon Nov 04: Synnada Nov 11: InfluxData Nov 18: GlareDB Nov 25: Greptime Dec 02: Databend→OpenDAL
📰 Freshly coded and ready to roll! ApacheDataFusion v41.0.0 is out github.com/apache/datafus…! With 245 commits from 69 contributors, this release brings major improvements, from performance boosts to new SQL features. Check out what's new! #OpenSource
There are many cool ApacheDataFusion integrations out there (Iceberg, Delta, HuggingFace, etc) but we are missing a final binary to integrate it all. Here is an idea github.com/apache/datafus…
This is a really cool exmaple of building a special index, for querying Parquet files using ApacheDataFusion. The research project makes SQL queries ~1000x faster with relatively straightforward optimizer pass in DataFusion and a novel index: uwheel.rs/post/datafusio…
There must be something in the water🚰 near Copenhagen that makes for good database engineers. Here is a PR from someone in Malmo (across the strait) that makes STDDEV / VAR in ApacheDataFusion 10x faster (2x faster end to end queries): github.com/apache/datafus…
Xiangpeng Hao strikes again: Turns out you can both 1. Implement 🇩🇪 strings in Rust 2. They actually improve end to end performance in ApacheDataFusion. It takes a lot more than a straightforward naive implementation to do so: influxdata.com/blog/faster-qu…
Andrey Tatarinov ApacheDataFusion Andy Grove Andrew Lamb Pydantic A lot of the power lies in Rust. Imagine how easy it would be to innovate if Postgres was written in Python, but with the same performance as c. The fact that datafusion is a library to build a DB with, is a very potent combination
Pekka Enberg "often see traits and generics overused" You can see such things in many languages. IMO the thing that you can do something doesn't mean that you should do it. The code should be as simple as it can be. But there are coding "wizards" that like to complicate things.
I am quite proud of the ApacheDataFusion community. I have been away for a few days and the code continues to flow nicely 💪: github.com/apache/datafus… The last few times I was away it seemed to me like the velocity slowed down (though maybe that analysis is overly self centered)
How does the term "DuckDB for streaming" make you feel? This week we are chatting with Matt Green and Amey | अमेय founder of buff.ly/3XnAasn about embedded stream processing, Apache DataFusion, why fault-tolerance is hard and also why we might not need it as much as we