From Base Model to Assistant

How base models become helpful chat assistants

In 2019, OpenAI's GPT-2 was considered dangerously powerful with 1.5 billion parameters trained on ~10 billion tokens (Radford et al., 2019). Today's frontier models have hundreds of billions of parameters trained on trillions of tokens — over 1,000x more data.

But size isn't everything. After pre-training, you get a base model — essentially sophisticated autocomplete. It just continues whatever text you give it. Type "What is the capital of France?" and it might continue with "What is the capital of Germany?" because questions tend to appear in lists. It doesn't answer you — it continues you.

📝

Base Model

= Autocomplete

Continues your text

💬

Chat Model

= Assistant

Answers your question

So how does a base model become a helpful assistant? Through a three-stage training pipeline. Think of it like training a doctor: first they go to school (pre-training on internet text), then they do a residency with hands-on practice (supervised fine-tuning on conversations), and finally they learn from patient feedback (RLHF — we'll dive deep into this in a later chapter).

📚

Pre-training

Read the internet

💬

SFT

Practice conversations

👍

RLHF

Learn preferences

The model also needs a special format — called a chat template — so it knows where your messages end and its responses begin. Special marker tokens tell the model who's speaking:

▶ start:system

You are a friendly assistant.

■ end

▶ start:user

Hello, how are you?

■ end

▶ start:assistant

I'm doing well! How can I help?

■ end

These markers are invisible to you but critical for the model to distinguish roles.

🎯

Your turn — try it out!

Order the Pipeline

Drag these stages into the correct order:

1RLHF — learn from human preference feedback

2Supervised Fine-Tuning (SFT) — practice conversations

3Pre-training — learn language from internet text