tl;dr:"The dataset of 300 billion tokens of text is used to generate training examples for the model. For example, these are three training examples generated from the one sentence at the top. You can see how you can slide a window across all the text and make lots of examples."