tl;dr:The inflection point was when the Postgres VACUUM process began to stall consistently, preventing the database from reclaiming disk space. Garrett discusses 3 key decisions the team made and lessons learned: (1) shard earlier. (2) Aim for a zero-downtime migration. (3) Introduce a combined primary key instead of a separate partition key.