Per-Message Context Window Control for Chat
Implement the following plan:
Per-Message Context Window Control for Chat
Goal
Add a slider to the chat UI that controls how many prior messages are included when sending a message to the model. This enables students to experiment with context window sizes and observe how models behave with limited history.
Design
- Control: HTML range slider with label showing current value
- Scope: Per-message (not persisted to DB) — the slider value is sent with each API request
- Range: 0 to 50, plus an "All" option (represented as
null) - Default: "All" (slider at max position, no truncation)
- Display: "Context: All" or "Context: 4 messages" shown inline
- Placement: Rendered alongside the model selector above the textarea in ChatInput
When contextWindow is a number, only the last N messages from conversation history are sent to the model. The system prompt is always included regardless of window size. 0 means only the current user message is sent with no history.
Files to Change
1. New component: apps/web/app/chat/components/context-window-control.tsx
A compact slider control:
- Native HTML
<input type="range">(no need for a new UI primitive) - Range: min=0, max=50, with an extra tick at 51 representing "All"
- Label updates reactively: "Context: All" or "Context: N messages" (or "Context: 0 messages" for no history)
- Props:
value: number | null,onChange: (value: number | null) => void,disabled?: boolean - Data-testid:
context-window-control,context-window-slider,context-window-label
2. Update apps/web/app/chat/client.tsx
- Add state:
const [contextWindow, setContextWindow] = useState<number | null>(null) - Pass
contextWindowin the API request body (for every message, not just new conversations) - Render
<ContextWindowControl>alongside the<ModelSelector>— both always visible in a flex row above the textarea - The context window control should always be visible (not just for new conversations), since it's per-message
3. Update apps/web/app/chat/components/chat-input.tsx
- Rename
modelSelectorprop tocontrols(or add a second prop) to accept the combined model selector + context window controls - Minor: the controls container should use flex row layout for side-by-side display
4. Update apps/api/src/routes/chat.ts
- Add
contextWindow: z.number().int().min(0).nullish()tochatRequestSchema - After building
chatMessagesfrom history (line ~100-103), truncate:if (data.contextWindow !== null && data.contextWindow !== undefined) { // Keep only the last N messages from history, then append current message const historyMessages = chatMessages.slice(0, -1); // all but current const currentMessage = chatMessages[chatMessages.length - 1]; const truncated = historyMessages.slice(-data.contextWindow); chatMessages = [...truncated, currentMessage]; } - This applies to both streaming and non-streaming paths since
chatMessagesis built once before branching
5. No database changes
The context window is per-message and not persisted. It's a transient UI control.
Verification
# TypeScript check
bun run tsc
# Lint
bun run lint
# Manual verification: open chat page, adjust slider, send messages
# with different context window sizes, observe model behavior changes
If you need specific details from before exiting plan mode (like exact code snippets, error messages, or content you generated), read the full transcript at: ~/.claude/projects/-Users-challen-www-usingandunderstanding-ai/76c2079e-f66f-44a2-9b12-629453b6f75a.jsonl
I'll implement this plan step by step. Let me start by reading the relevant files to understand the current code structure.
Now let me also look at the model selector component for reference on styling.
Good. I have all the context I need. Let me now create the task list and implement all the changes.
Now let me implement. Starting with the new component.
Now update ChatInput to rename modelSelector to controls.
Now update the ChatClient to add contextWindow state, pass it in requests, and render both controls.
Now update the fetch call to include contextWindow:
Update the useCallback dependency array to include contextWindow:
Now update the ChatInput rendering to use controls prop and include both the model selector and context window control:
Now update the API chat route to accept and use contextWindow.
Now add the truncation logic after building chatMessages and before building the system prompt. The truncation should happen after the user message is pushed, right before the agent loading:
Now let me run the type check and lint.
Both pass cleanly.
All done. Here's a summary of the changes:
New file:
apps/web/app/chat/components/context-window-control.tsx— Compact slider component with range 0–51 (where 51 = "All"). Displays "Context: All" or "Context: N messages". Usesdata-testidattributes for testing.
Modified files:
apps/web/app/chat/components/chat-input.tsx— RenamedmodelSelectorprop tocontrols, changed container to flex row layout withgap-4for side-by-side display.apps/web/app/chat/client.tsx— AddedcontextWindowstate (defaultnull= all), imported and renderedContextWindowControlalways visible alongside the model selector, includedcontextWindowin every API request body.apps/api/src/routes/chat.ts— AddedcontextWindow: z.number().int().min(0).nullish()to the schema. After buildingchatMessages, truncates history to the last N messages whencontextWindowis set, keeping only the current user message plus N prior messages. System prompt is always included (it's passed separately to the LLM).
Nice! Let's make 52 the default now for chat.
(no content)
Done. The slider now ranges from 0 to 52 messages, with the next tick (53) representing "All".
No, sorry! I meant the 52 model.
(no content)
No, sorry! I meant the 52 model.
Reverted the slider max back to 50 and changed the default model to gpt-5.2.
Nice. Now, that raises a question though. How do we handle context overflow in the chat?
(no content)
Like I know that, when using Claude, it will compact history at some point. Should we do that or just history gradually walk off the edge?
(no content)
Given the educational purpose of this tool — students learning how context windows work — I'd lean toward letting history walk off the edge (truncation) rather than compaction. A few reasons:
- Pedagogical value: Students should experience what happens when the model "forgets" earlier messages. Compaction would hide that effect, which is exactly what you want them to observe.
- Simplicity: Compaction requires an extra LLM call to summarize, adds latency, and introduces complexity around when to trigger it and how to store the summary.
- The slider already handles this: The context window control you just added is essentially manual truncation. Students can see the difference between sending 5 messages vs 50 vs all.
The remaining question is what to do as a safety net when someone leaves the slider on "All" and the conversation gets long enough to exceed the model's token limit. Options there:
- Silent truncation server-side: Automatically drop oldest messages to fit. Simple, but the student might not realize it happened.
- Error with a message: Let the API error and tell the student to reduce context. More visible, but worse UX.
- Do nothing: Azure OpenAI will return a 400 error, which currently shows as "[Error: Failed to generate response]".
Right now you're effectively doing option 3. Would you want to add automatic server-side truncation as a fallback, or is the current behavior (error if too long) fine for now?
Right. The message limit is a bit deceptive in a way, since this is always done by tokens. Do we have a way of computing the tokenization for a message? We might want this later for explanatory purposes, but also to use it to figure out when to drop messages. We might also want to have dropped messages marked in the UI somehow.
(no content)
Currently there's no way to count tokens before sending a request. The codebase tracks token usage after API responses — promptTokens, completionTokens, totalTokens are extracted from the API response and stored in message metadata — but there's no local tokenizer installed.
For pre-request token counting (which you'd need for both UI display and server-side truncation), you'd need a library like:
tiktoken— OpenAI's official tokenizer (Rust/WASM, most accurate for GPT models)gpt-tokenizer— Pure JS port, no WASM dependency, easier to bundlejs-tiktoken— Lighter JS wrapper around tiktoken
Any of these would let you count tokens per message, which enables:
- UI display: Show token count next to each message or as a running total
- Smart truncation: Drop oldest messages when total tokens approach the model's limit, rather than using a fixed message count
- Dropped message indicators: Mark messages in the UI that were excluded from the current request
The message-count slider you just built is a good teaching tool on its own, but token-based truncation would be needed for a proper safety net since a single long message could blow the context window regardless of message count.
Want me to plan out adding token counting, or is this more of a future consideration?
I want to avoid the 400 errors, so I think at minimum we need this on the backend. It should do per-message counting, add this to the database, and expose the information to the frontend. At some point we might also want to add some API endpoint to the backend to explore tokenization, returning counts and token boundaries, but I don't think we need that yet.
Good context. Let me design this properly.
Let me verify a couple of specifics before finalizing the plan.
Now let me check the latest gpt-tokenizer version for the dependency pin.
I have all the context I need. Let me write the plan.