tl;dr:Dropbox had a data validation problem, and this post discusses how they implemented a new quality check system in their big data pipelines that achieves a “balance of simplicity and coverage - providing good quality data, without being needlessly difficult or expensive to maintain.”