In 2019, OpenAI's GPT-2 was considered dangerously powerful with 1.5 billion parameters trained on ~10 billion tokens (Radford et al., 2019). Today's frontier models have hundreds of billions of parameters trained on trillions of tokens — over 1,000x more data.
But size isn't everything. After pre-training, you get a base model — essentially sophisticated autocomplete. It just continues whatever text you give it. Type "What is the capital of France?" and it might continue with "What is the capital of Germany?" because questions tend to appear in lists. It doesn't answer you — it continues you.
📝
Base Model
= Autocomplete
Continues your text
💬
Chat Model
= Assistant
Answers your question
So how does a base model become a helpful assistant? Through a three-stage training pipeline. Think of it like training a doctor: first they go to school (pre-training on internet text), then they do a residency with hands-on practice (supervised fine-tuning on conversations), and finally they learn from patient feedback (RLHF — we'll dive deep into this in a later chapter).
📚
Pre-training
Read the internet
💬
SFT
Practice conversations
👍
RLHF
Learn preferences
The model also needs a special format — called a chat template — so it knows where your messages end and its responses begin. Special marker tokens tell the model who's speaking:
You are a friendly assistant.
Hello, how are you?
I'm doing well! How can I help?
These markers are invisible to you but critical for the model to distinguish roles.
Your turn — try it out!
Order the Pipeline
Drag these stages into the correct order: