Jacob Pettingill (@jacobpettingill) 's Twitter Profile
Jacob Pettingill

@jacobpettingill

Lead Data Engineer at @ParticlHQ

ID: 708377574653579264

calendar_today11-03-2016 19:42:29

141 Tweet

292 Followers

645 Following

Jacob Pettingill (@jacobpettingill) 's Twitter Profile Photo

We tried AWS CDK, but because we were a young team, it was easier to use the AWS console. Instead, we created AWS Lambda functions individually and then deployed zip files to them. Now we deploy docker images on hundreds of different Lambda functions for our data jobs,

Jacob Pettingill (@jacobpettingill) 's Twitter Profile Photo

Communicating my thoughts has always been one of my biggest weaknesses. It takes so much energy to align with other humans. But it's so worth it.

Jacob Pettingill (@jacobpettingill) 's Twitter Profile Photo

We cut our nightly processing costs in half by moving fully to vector operations. Anytime we could avoid a loop and instead generate a flag or a mask, we did it. We run fully serverless (AWS Lambda) so every second counts.

Jacob Pettingill (@jacobpettingill) 's Twitter Profile Photo

Is anyone out there doing no unique keys on their tables to deal with ingestion speeds? We are getting hammered by duplicate checking on our inserts

Jacob Pettingill (@jacobpettingill) 's Twitter Profile Photo

unique keys vs no enforced uniqueness Getting 545 million rows with no key vs 170 million rows with keys (3.2x). Not sure how the gap widens as more and more data gets into the table (I imagine no keys dominates more and more). Biggest thing would be having to manage deduping

unique keys vs no enforced uniqueness

Getting 545 million rows with no key vs 170 million rows with keys (3.2x). Not sure how the gap widens as more and more data gets into the table (I imagine no keys dominates more and more).

Biggest thing would be having to manage deduping
Jacob Pettingill (@jacobpettingill) 's Twitter Profile Photo

We tried a single table design on DynamoDB and it flopped. It did deliver on speed and scalability like it promised, but when you are a data company, you really need the power of SQL.

Jacob Pettingill (@jacobpettingill) 's Twitter Profile Photo

One of our latests adds in the last year has been in-store tracking. This data scales pretty fast. Basically (# days) * (# of variants) * (# of stores) = (# of rows) Our historical Lululemon data is dwarfed by our in-store data.