3

Neural Network

How billions of parameters predict the next token

Watch this section

You type “The capital of France is” into ChatGPT and hit enter. What happens now? Your tokens from the previous chapter get fed into a neural network — a giant web of billions of numbers that were carefully tuned during training. Each token passes through dozens of layers, and at every layer the model essentially asks itself: “Which of the words I’ve seen so far matter most for figuring out what comes next?” This process is called attention.

How big is this network?

GPT-2
1.5B
GPT-4
~1.8T
LLaMA 3
405B

Each parameter is a single decimal number (like 0.0237). Billions of these tiny dials, tuned during training, encode everything the model knows.

Input

Tokens

Process

Neural Network

Output

Probabilities

For example, when the model is processing “capital”, it learns to pay extra attention to “France” — because the country determines which capital we’re talking about. After passing through all these layers, the network spits out a probability for every possible next token in its vocabulary — roughly 100,000 options. For our prompt, it might assign 92% to “Paris”, 3% to “the”, and tiny fractions to everything else. The model never “knows” the answer — it just predicts which token is most likely to come next, one token at a time.

🎯

Your turn — try it out!

1
2

Pick the Most Likely Next Token (Round 1/4)

Prompt:

"The capital of France is" ___

Which token is most likely to come next?