tl;dr:"There are so many brilliant posts on GPT-3, demonstrating what it can do, pondering its consequences, vizualizing how it works. With all these out there, it still took a crawl through several papers and blogs before I was confident that I had grasped the architecture. So the goal for this page is humble, but simple: help others build an as detailed as possible understanding of the GPT-3 architecture."