Training Stage Matcher

See how pretraining, instruction tuning, and RLHF each change the same model's behavior. Guess which training stage produced a response, then see all three side-by-side.

Pedagogical Goals

•Make the three-stage training pipeline concrete by showing real behavioral differences
•Let students discover that the base model is not broken, just trained for a different task (text prediction)
•Highlight the subtle but important difference between instruction tuning (follows directions) and RLHF (has preferences and values)
•Show that the warmth, safety awareness, and conversational style of modern AI assistants are the result of human choices during training

How It Works

The tool sends the same prompt to GPT three times with different system prompts that simulate each training stage. The base model prompt instructs the model to continue text like an autocomplete system. The instruction-tuned prompt makes it answer directly without warmth or judgment. The RLHF version uses the model's natural behavior. One response is shown and the student guesses which stage produced it, then all three are revealed for comparison.

How It Was Built

Built as a client component that calls a dedicated API endpoint. The server runs three parallel LLM calls with different system prompts and temperature settings (0.9 for base, 0.3 for instruction-tuned, 0.7 for RLHF), then returns the responses in a shuffled order. The client manages the guess-and-reveal flow with a simple state machine.