Jacob Pettingill (@jacobpettingill) Twitter Tweets • TwiDoom

Jacob Pettingill

9 days ago

We tried AWS CDK, but because we were a young team, it was easier to use the AWS console. Instead, we created AWS Lambda functions individually and then deployed zip files to them. Now we deploy docker images on hundreds of different Lambda functions for our data jobs,

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

8 days ago

Communicating my thoughts has always been one of my biggest weaknesses. It takes so much energy to align with other humans. But it's so worth it.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

8 days ago

We got 3x speed increases by swapping 2 columns in our sort key

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

7 days ago

Rebuilding things from scratch is what we all want to do. It feels like we can do "better" than our old selves or others.

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

6 days ago

We cut our nightly processing costs in half by moving fully to vector operations. Anytime we could avoid a loop and instead generate a flag or a mask, we did it. We run fully serverless (AWS Lambda) so every second counts.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

6 days ago

We're pulling in about 350 million rows a day. Not PB scale but still a good amount.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

6 days ago

Is anyone out there doing no unique keys on their tables to deal with ingestion speeds? We are getting hammered by duplicate checking on our inserts

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

5 days ago

unique keys vs no enforced uniqueness Getting 545 million rows with no key vs 170 million rows with keys (3.2x). Not sure how the gap widens as more and more data gets into the table (I imagine no keys dominates more and more). Biggest thing would be having to manage deduping

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

5 days ago

We tried a single table design on DynamoDB and it flopped. It did deliver on speed and scalability like it promised, but when you are a data company, you really need the power of SQL.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

5 days ago

SQL > pandas

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jacob Pettingill

@jacobpettingill

5 days ago

One of our latests adds in the last year has been in-store tracking. This data scales pretty fast. Basically (# days) * (# of variants) * (# of stores) = (# of rows) Our historical Lululemon data is dwarfed by our in-store data.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare