tl;dr:8 best practices shared including: (1) Alerts are treated as code i.e. go through code reviews, generated from existing modules. (2) Use percentiles over averages to get a higher quality signal. (3) Use playbooks to document each alert so there is corresponding documentation that explains what is broken and how to investigate and fix it.