/GPT

Let's Reproduce GPT-2 (124M)

- Andrej Karpathy tl;dr: “We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations.” 

featured in #523


Let's Build The GPT Tokenizer

- Andrej Karpathy tl;dr: “In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI. In the process, we will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.”

featured in #491


ChatGPT Plugins: Build Your Own In Python!

- James Briggs tl;dr: OpenAI's ChatGPT launched plugins, which can be built by anyone. James demonstrates how to build a plugin using the chatgpt-retrieval-plugin template.

featured in #402


What Is ChatGPT Doing … and Why Does It Work?

- Stephan Wolfram tl;dr: "My purpose here is to give a rough outline of what’s going on inside ChatGPT—and then to explore why it is that it can do so well in producing what we might consider to be meaningful text. I should say at the outset that I’m going to focus on the big picture of what’s going on—and while I’ll mention some engineering details, I won’t get deeply into them.”

featured in #390


GPT In 60 Lines Of NumPy

- Jay Mody tl;dr: "In this post, we'll implement a GPT from scratch in just 60 lines of numpy. We'll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text.”

featured in #389


GPT Is Only Half Of The AI Language Revolution

- Jason Phillips tl;dr: In this post, Slite Engineer Jason Phillips examines AI breakthroughs like GPT, exploring their potential for categorizing, filtering, and processing data. He suggests real-world applications rely more on processing than content generation.

featured in #387


Let's Build GPT: From Scratch, In Code, Spelled Out

- Andrej Karpathy tl;dr: "We build a GPT, following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT."

featured in #382


How GPT3 Works - Visualizations And Animations

- Jay Alammar tl;dr: "The dataset of 300 billion tokens of text is used to generate training examples for the model. For example, these are three training examples generated from the one sentence at the top. You can see how you can slide a window across all the text and make lots of examples."

featured in #378


The GPT-3 Architecture, On A Napkin

- Daniel Dugas tl;dr: "There are so many brilliant posts on GPT-3, demonstrating what it can do, pondering its consequences, vizualizing how it works. With all these out there, it still took a crawl through several papers and blogs before I was confident that I had grasped the architecture. So the goal for this page is humble, but simple: help others build an as detailed as possible understanding of the GPT-3 architecture."

featured in #375


Building A Virtual Machine Inside ChatGPT

- Jonas Degrave tl;dr: The authors shows how to "build a virtual machine, inside an assistant chatbot, on the alt-internet, from a virtual machine, within ChatGPT's imagination."

featured in #372