Andrew Lamb (@andrewlamb1111) 's Twitter Profile
Andrew Lamb

@andrewlamb1111

Apache {DataFusion, Arrow} PMC, Database Engineer

ID: 1326266114805002241

linkhttp://andrew.nerdnetworks.org/ calendar_today10-11-2020 20:51:18

404 Tweet

2,2K Followers

51 Following

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

Great proposal (technically and consenus building) for improving type handling in ApacheDataFusion by github.com/notfilippo. github.com/apache/datafus…. Datadog, Inc. is luck to have him.

Synnada (@synnadahq) 's Twitter Profile Photo

📰 Freshly coded and ready to roll! ApacheDataFusion v41.0.0 is out github.com/apache/datafus…! With 245 commits from 69 contributors, this release brings major improvements, from performance boosts to new SQL features. Check out what's new! #OpenSource

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

There are many cool ApacheDataFusion integrations out there (Iceberg, Delta, HuggingFace, etc) but we are missing a final binary to integrate it all. Here is an idea github.com/apache/datafus…

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

This is a really cool exmaple of building a special index, for querying Parquet files using ApacheDataFusion. The research project makes SQL queries ~1000x faster with relatively straightforward optimizer pass in DataFusion and a novel index: uwheel.rs/post/datafusio…

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

There must be something in the water🚰 near Copenhagen that makes for good database engineers. Here is a PR from someone in Malmo (across the strait) that makes STDDEV / VAR in ApacheDataFusion 10x faster (2x faster end to end queries):  github.com/apache/datafus…

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

Xiangpeng Hao strikes again: Turns out you can both 1. Implement 🇩🇪 strings in Rust 2. They actually improve end to end performance in ApacheDataFusion. It takes a lot more than a straightforward naive implementation to do so: influxdata.com/blog/faster-qu…

Samuel Colvin (@samuel_colvin) 's Twitter Profile Photo

Andrey Tatarinov ApacheDataFusion Andy Grove Andrew Lamb Pydantic A lot of the power lies in Rust. Imagine how easy it would be to innovate if Postgres was written in Python, but with the same performance as c. The fact that datafusion is a library to build a DB with, is a very potent combination

Michal Piotrowski 🦀 (@practicalrs) 's Twitter Profile Photo

Pekka Enberg "often see traits and generics overused" You can see such things in many languages. IMO the thing that you can do something doesn't mean that you should do it. The code should be as simple as it can be. But there are coding "wizards" that like to complicate things.

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

I am quite proud of the ApacheDataFusion community. I have been away for a few days and the code continues to flow nicely 💪: github.com/apache/datafus… The last few times I was away it seemed to me like the velocity slowed down (though maybe that analysis is overly self centered)

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

Part 2 of how to use StringView / German Strings to *actually* make queries faster. Comparisons, inlining, gc, and buffer size tuning. influxdata.com/blog/faster-qu… Again all due to our incredible intern Xiangpeng Hao

Kostas Pardalis (@kostaspardalis) 's Twitter Profile Photo

How does the term "DuckDB for streaming" make you feel? This week we are chatting with Matt Green and Amey | अमेय founder of buff.ly/3XnAasn about embedded stream processing, Apache DataFusion, why fault-tolerance is hard and also why we might not need it as much as we