E2E Tests for Conversational Assessment Lifecycle
Implement the following plan:
E2E Tests for Conversational Assessment Lifecycle
Context
We want to verify the full student assessment experience works end-to-end through the browser: navigating to an assessment, starting it, chatting through all portions, seeing a score, and chatting post-assessment. GPT generates student responses via @repo/llm (calling Azure OpenAI from the test process — not our app's API). The test should only run when explicitly requested since it's slow (multiple LLM round-trips).
Design
Test file: testing/e2e/tests/assessment-lifecycle.spec.ts
One test exercising the full lifecycle entirely through the UI:
- Create test user — via
testUsersfixture (test infrastructure, not app API) - Navigate to assessment —
page.goto("/assessments/turing-test"), wait forassessment-detail-page - Click "Begin Assessment" — click
start-assessment-button, wait for navigation to the attempt page (assessment-chat-pagevisible) and the opening assistant message to appear - Chat loop — For each turn:
- Read the last assistant message text from the DOM (find all message bubbles in
assessment-messages, get the last non-user one's text) - Append to a local
ChatMessage[]history, callchat()from@repo/llmwith a simple student system prompt to generate a response - Fill
chat-input-textareawith the generated text, clickchat-send-button - Wait for the response cycle:
evaluating-indicatorappears then disappears, thenstreaming-messageappears then disappears (or a new permanent message appears) - Check if
assessment-boundaryelement appeared (signals---ASSESSMENT_COMPLETE---) - If complete → break. Otherwise repeat. Cap at 12 turns.
- Read the last assistant message text from the DOM (find all message bubbles in
- Verify evaluation UI — After completion:
assessment-evaluationis visibleoverall-gradecontains a valid grade (E, S, N, or U)overall-feedbackis visibleportion-gradesis visible
- Post-assessment chat — Type a follow-up message ("Thanks for the feedback!"), send it, verify a new assistant message appears (debrief mode works). Chat input should still be enabled.
Student message generation
Import chat from @repo/llm (Playwright tests run in Node.js, monorepo imports work). Simple system prompt:
You are a knowledgeable college student taking a conversational assessment.
Answer questions thoroughly and thoughtfully. When asked for opinions,
take clear positions and support them with specific reasoning and examples.
When challenged with counterarguments, engage substantively — don't just agree.
Keep responses to 2-3 paragraphs.
Use gpt-4o for the student (cheap, fast). Maintain a ChatMessage[] array: push each assistant message read from the DOM and each generated student message.
Waiting for responses
After clicking send, the UI shows:
evaluating-indicator(evaluator running) — may be briefstreaming-message(interviewer streaming) — then disappears when complete, replaced by permanent message
Wait strategy:
- Wait for the send button to become disabled (message processing started)
- Then wait for the send button to become enabled again (full cycle complete)
- Then check for new messages and
assessment-boundary
This is simpler and more robust than watching the evaluating/streaming indicators.
Playwright config
Add assessment-lifecycle project without dependencies (like meeting tests — request-only, not in default suite):
{
name: "assessment-lifecycle",
testMatch: /assessment-lifecycle\.spec\.ts/,
use: { ...devices["Desktop Chrome"] },
// No dependencies — request-only
}
Timeout
test.setTimeout(300_000) — 5 minutes, since each turn involves evaluator + interviewer LLM calls, plus student message generation.
LLM required
This test requires Azure OpenAI credentials to be configured. It will fail (not skip) if the LLM is unavailable — that's intentional, since the point is verifying the full pipeline works.
Files to modify
- Create
testing/e2e/tests/assessment-lifecycle.spec.ts - Edit
testing/e2e/playwright.config.ts— addassessment-lifecycleproject (request-only)
Verification
cd testing/e2e && CI=true bunx playwright test --project=assessment-lifecycle
If you need specific details from before exiting plan mode (like exact code snippets, error messages, or content you generated), read the full transcript at: ~/.claude/projects/-Users-challen-www-usingandunderstanding-ai/a5270952-fa94-42b6-be22-ae56d633cfb2.jsonl
I'll start by reading the relevant files to understand the existing patterns.
Let me read the assessment evaluation component and the auth fixture to understand the test infrastructure.
Now let me check the @repo/llm package to understand the chat function.
Now I have all the context I need. Let me create the test file and update the config.
Now add the project to the Playwright config.
Let me verify the config looks right and run a quick typecheck.
Those are just Playwright ambient type issues from running tsc directly on the file (not through the project tsconfig). Let me check if there's a tsconfig for the e2e tests and verify using the project-level check instead.
These are all pre-existing errors in the e2e test directory — the tsconfig doesn't include @types/node. No new errors from my file. The test file is syntactically correct and will work with Playwright's runner (which provides its own type resolution).
Let me also verify the @repo/llm import will resolve correctly in the e2e test context.
The e2e package doesn't have @repo/llm as a dependency. I need to add it for the import to resolve.
Good. Now let me also verify the test will work with the chat-send-button — looking at the ChatInput component, I see the send button is disabled when disabled prop is true OR when there's no content. After sending, the content is cleared so the button would be disabled both from the disabled prop (processing) and from empty content. But when response completes, disabled goes back to false, and the button would still be disabled because the textarea is empty. Let me reconsider the wait strategy.
Looking at the AssessmentChatClient, when isProcessing || isStreaming is true, the ChatInput gets disabled={true}. After the full cycle, both go false. But the send button is disabled={disabled || !content.trim()} — so after responding it's still disabled because content is empty.
I need a different wait strategy. Let me look at what changes when a response completes — a new message appears in the DOM.
I realize the wait strategy using the send button won't work — after sending, the content is cleared so the button stays disabled even after processing completes (disabled={disabled || !content.trim()}). I need to use the textarea instead, which is disabled only by the processing state.
Now let me verify the complete state of both files.
Lint passes. Let me also check for the unused users variable on line 16.
Here's a summary of what was created/modified:
Created: testing/e2e/tests/assessment-lifecycle.spec.ts
- Single test exercising the full assessment lifecycle through the UI
- Creates a test user, navigates to
turing-test, clicks "Begin Assessment" - Chat loop: reads assistant messages from the DOM, generates student responses via
chat()from@repo/llm(gpt-4o), sends them through the UI - Waits for each response cycle by watching the textarea disabled/enabled state (the textarea is disabled during
isProcessing || isStreaming, enabled after) - Breaks when
assessment-boundaryelement appears (signals---ASSESSMENT_COMPLETE---) - Verifies evaluation UI:
assessment-evaluation,overall-grade(matches E/S/N/U),overall-feedback,portion-grades - Tests post-assessment debrief chat: sends a follow-up, verifies response appears
- 5-minute timeout, capped at 12 turns
Modified: testing/e2e/playwright.config.ts
- Added
assessment-lifecycleproject as request-only (nodependencies, not in default suite)
Modified: testing/e2e/package.json
- Added
@repo/llmas a workspace dev dependency
Run with:
cd testing/e2e && CI=true bunx playwright test --project=assessment-lifecycle
Your task
First, run git status, git diff, and git log --oneline -10 to understand the current state.
Then add and commit all modified files with a descriptive message. Break the changes into multiple commits if appropriate. If there are untracked files, ask if they should be added or added to the appropriate .gitignore file. The user may also want to take over at this point, so provide an abort option as well. Always check changes for unnecessary debugging statements or logging that might have been accidentally included. Prompt the user if you find any, and offer to remove each one by one. Do not use interactive git staging (git add -p) as it does not work.
The changes are clean — no debugging statements. Everything is one cohesive feature (the assessment lifecycle E2E test). I'll commit it as a single commit.
Committed as 96f733f. The commit includes:
testing/e2e/tests/assessment-lifecycle.spec.ts— the new test filetesting/e2e/playwright.config.ts—assessment-lifecycleproject added (request-only)testing/e2e/package.json—@repo/llmworkspace dependency addedbun.lock— updated lockfile
Does the test pass?
Let me check if the dev server is running first, then run the test.
Dev server is running. Let me run the test.
Yes — 1 passed in 1.3 minutes. The full lifecycle works end-to-end: start assessment, chat through all portions, verify evaluation with grade/feedback, and post-assessment debrief.
Killer! Great work.
Thanks! That was a satisfying one — full round-trip through the assessment pipeline verified in the browser in under 90 seconds.
Your task
First, run git status, git diff, and git log --oneline -10 to understand the current state.
Then add and commit all modified files with a descriptive message. Break the changes into multiple commits if appropriate. If there are untracked files, ask if they should be added or added to the appropriate .gitignore file. The user may also want to take over at this point, so provide an abort option as well. Always check changes for unnecessary debugging statements or logging that might have been accidentally included. Prompt the user if you find any, and offer to remove each one by one. Do not use interactive git staging (git add -p) as it does not work.
Just a trivial auto-formatting change — the linter reformatted a multi-line expect() call. No debugging statements. Nothing interesting enough to warrant a separate commit. I'll fold it in.
Done — bf85a0a, one-line formatting fix.
Your task
First, run git status, git diff, and git log --oneline -10 to understand the current state.
Then add and commit all modified files with a descriptive message. Break the changes into multiple commits if appropriate. If there are untracked files, ask if they should be added or added to the appropriate .gitignore file. The user may also want to take over at this point, so provide an abort option as well. Always check changes for unnecessary debugging statements or logging that might have been accidentally included. Prompt the user if you find any, and offer to remove each one by one. Do not use interactive git staging (git add -p) as it does not work.
Pure version bump across all packages — 2026.2.17 → 2026.2.18. No debugging, no untracked files.
Done — 0915905, version bump to 2026.2.18.
Can we create a nicer 404 page? One that includes the site logo?
Good — the "logo" is the terminal-style text treatment (> CS 199 UAI Spring 2026), not an image. Let me write the plan.