How AI Learns to Be Helpful
Today's Plan
You've explored how LLMs predict the next word, how temperature affects their output, and how embeddings represent meaning. But there's a gap between a model that predicts text and a model that helps you. Today you'll discover how that gap gets closed: through a multi-stage training process that bakes in human preferences about what "good" AI behavior looks like.
You'll work with a partner to identify how model behavior changes at each training stage, then practice the preference ranking that shapes modern AI assistants.
Log in to participate in this activity.
Log InExplore: Match the Training Stage
This activity involves working with a partner.
The Three Training Stages
Modern AI assistants aren't trained all at once. They go through three distinct stages, and each stage changes how the model behaves:
-
Pretraining (Base model): The model reads billions of words from the internet and learns to predict the next token. It learns language, facts, and patterns, but it has no concept of "answering a question" or "being helpful."
-
Instruction tuning: The model is trained on examples of questions paired with good answers. It learns to follow directions and produce the kind of output a user would expect.
-
RLHF (Reinforcement Learning from Human Feedback): Human raters compare pairs of model outputs and pick the "better" one. The model learns to produce responses that humans prefer: warmer, more careful, more nuanced.
Match the Response to the Training Stage
The tool below generates a real response from each training stage for the same prompt. You'll see one response and guess which stage produced it. After guessing, all three responses are revealed so you can compare. Try a few rounds with your partner.
Training Stage Matcher
“What is the meaning of life?”
Discussion: What Did You Notice?
Explore: Be the Preference Rater
This activity involves working with a partner.
Be the Preference Rater
Companies like OpenAI and Anthropic train their AI assistants using human feedback. Real people read pairs of model outputs and decide which one is "better." The model then learns to produce more outputs like the preferred ones.
Now it's your turn. The tool below generates two responses to the same prompt, each optimized for a different value. Pick which response you think is better, then see what value each was optimized for. Try several rounds with your partner and discuss where you agree and disagree.
Preference Rater
“Can you write my essay about climate change for me?”
Discussion: What Is 'Good' AI?
Generate Questions
This activity involves working with a partner.
What Are You Curious About?
You've now seen how models change through training, and you've experienced the preference ranking process yourself. Based on what you observed and discussed, what questions do you have about how AI gets trained?
Enter at least 3 questions below.
Log in to submit questions.
Question Review
Log in to view the question board.
Investigate
This activity involves working with a partner.
Investigate
Discuss this question with your partner. Use what you observed in the examples and the ranking exercise to reason about possible answers. You can also use other resources if helpful, but focus on building your own understanding.
Log in to submit a response.
Share Out
Feedback
Log in to submit feedback.