Plan: Automated K8s Database Backup CronJob
Implement the following plan:
Plan: Automated K8s Database Backup CronJob
Context
The manual bun run db:backup script works, but we need automated daily backups running in the K8s cluster. Key challenges:
- rclone Box OAuth tokens expire (~60 min access, ~60 day refresh) and rclone needs to write refreshed tokens back to its config file — a read-only K8s Secret won't work
- Backup failures should send email notifications without hardcoding recipient addresses
- The backup container needs
pg_dump,gzip,rclone, and an SMTP client
New/Modified Files
| File | Purpose |
|---|---|
k8s/db-backup/Dockerfile | Alpine image with pg_dump, gzip, rclone, msmtp |
k8s/db-backup/backup.sh | Shell script that dumps both databases to Box |
k8s/db-backup.yaml | CronJob manifest + PVC for rclone config |
k8s/rclone-setup.sh | One-time script to copy local rclone.conf into the K8s PVC |
k8s/deploy.sh | Add CronJob manifest to deployment flow |
content/mail-lists.ts | Add backup mail list forwarding to [email] |
Task 1: Docker Image (k8s/db-backup/Dockerfile)
Lightweight Alpine image with all required tools:
FROM alpine:3.21
RUN apk add --no-cache postgresql16-client gzip rclone msmtp bash
COPY backup.sh /backup.sh
RUN chmod +x /backup.sh
CMD ["/backup.sh"]
postgresql16-clientprovidespg_dump(matches the pg16 server image)msmtpis a minimal SMTP client for failure notifications- Image pushed to
docker.io/usingandunderstandingai/db-backup:latest
Task 2: Backup Script (k8s/db-backup/backup.sh)
Connects directly to postgres in-cluster (no kubectl needed). On failure, sends notification via the mail service.
Flow:
- For each env (
prod,dev):pg_dump -h postgres -U postgres -d $DB_NAME | gzip | rclone rcat box:usingandunderstanding-backups/{date}/{filename}.sql.gzPGPASSWORDfromDB_PASSWORDenv var (injected fromshared-secrets)--rclone-configpoints to the PVC mount at/rclone/rclone.conf
- On any failure, send notification email via
msmtpto[email]through the internal mail service ([redacted]) - Exit with appropriate code so K8s tracks job success/failure
msmtp config (inline in backup.sh):
account default
host mail-service
port 2525
from [email]
Task 3: K8s Manifests (k8s/db-backup.yaml)
PVC — small persistent volume for rclone config (so refreshed OAuth tokens survive):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rclone-config
namespace: usingandunderstandingai
spec:
accessModes: [ReadWriteOnce]
storageClassName: rook-ceph-block-doublereplicated-discard
resources:
requests:
storage: 1Mi
CronJob:
- Schedule:
0 8 * * *(2 AM Central = 8:00 UTC, since cluster likely uses UTC) concurrencyPolicy: ForbidsuccessfulJobsHistoryLimit: 3,failedJobsHistoryLimit: 3- Container mounts:
rclone-configPVC at/rclone(read-write for token refresh)
- Env vars:
DB_PASSWORDfromshared-secrets
restartPolicy: OnFailureimagePullSecrets: docker-registry
Task 4: rclone Setup Script (k8s/rclone-setup.sh)
One-time script to initialize the rclone PVC from local ~/.config/rclone/rclone.conf:
- Ensure PVC exists (apply
db-backup.yamlfirst) - Run a temporary Alpine pod with the PVC mounted
kubectl cplocal rclone.conf into the pod at/rclone/rclone.conf- Delete the temporary pod
This only needs to run once. If the PVC is lost, re-run the script.
Task 5: Mail List (content/mail-lists.ts)
Add backup list so notification recipients are configurable without touching K8s manifests:
backup: {
address: "backup",
displayName: "Backup Notifications",
description: "Database backup failure notifications",
recipients: { emails: ["[email]"] },
allowedSenders: ["[email]"],
},
Works on both usingandunderstanding.ai and dev.usingandunderstanding.ai since mail-lists.ts is shared.
Task 6: Integration
- Add
kubectl apply -f "$SCRIPT_DIR/db-backup.yaml"tok8s/deploy.shafter postgres is ready - Add image build/push to
scripts/deploy.tsor document as a manual step (the image changes rarely) - Add
db:backup:buildanddb:backup:pushscripts to rootpackage.jsonfor manual image management - Document the CronJob in
CLAUDE.md
Verification
- Build and push the image:
bun run db:backup:build && bun run db:backup:push - Run rclone setup:
./k8s/rclone-setup.sh - Apply the CronJob:
kubectl apply -f k8s/db-backup.yaml - Trigger a manual run:
kubectl create job --from=cronjob/db-backup db-backup-test -n usingandunderstandingai - Check logs:
kubectl logs job/db-backup-test -n usingandunderstandingai - Verify files appear in Box under
usingandunderstanding-backups/ - Test failure notification by temporarily using a wrong postgres hostname, confirm email arrives
If you need specific details from before exiting plan mode (like exact code snippets, error messages, or content you generated), read the full transcript at: ~/.claude/projects/-Users-challen-www-usingandunderstanding-ai/72399be9-a1f6-4e4d-b3a5-94603a50127d.jsonl
Let me start by reading the existing files I need to understand.
Now I have a good understanding of all the pieces. Let me check how the mail service processes incoming mail to understand the allowed senders / list resolution flow.
Now let me implement everything. I'll start with the files that have no dependencies on each other.
Now update deploy.sh to include the db-backup manifest.
Now update the step numbers in the comments that follow.
Now add build/push scripts to package.json.
Now update CLAUDE.md with the backup documentation.
I need to find the "Sync and Backup" section in CLAUDE.md to update it.
Now let me verify everything looks right by checking the modified files.
Let me also check the mail deployment includes a dev-mail. The deploy.sh doesn't apply mail.yaml or dev-mail.yaml currently — that must be handled separately. That's fine, that's existing behavior unrelated to our changes.
Let me also verify the lint/build passes for the mail-lists.ts change.
Good, no type errors. Let me verify all files look correct.
All done. Here's a summary of everything created and modified:
New files:
k8s/db-backup/Dockerfile— Alpine image withpg_dump,gzip,rclone,msmtp,bashk8s/db-backup/backup.sh— Dumps bothprodanddevdatabases, pipes throughgziptorclone rcaton Box, sends failure email viamsmtpto[email]through the internal mail servicek8s/db-backup.yaml— PVC for rclone config (1Mi,rook-ceph-block-doublereplicated-discard) + CronJob running daily at 2 AM Central (8:00 UTC)k8s/rclone-setup.sh— One-time script that spins up a temp pod, copies local~/.config/rclone/rclone.confinto the PVC, then cleans up
Modified files:
content/mail-lists.ts— Addedbackupmail list (sends to[email], allows[email]as sender)k8s/deploy.sh— Added Step 4 to applydb-backup.yamlafter postgres is readypackage.json— Addeddb:backup:buildanddb:backup:pushscriptsCLAUDE.md— Documented the automated backup CronJob, build/push commands, and rclone setup
Deployment steps:
bun run db:backup:build && bun run db:backup:push./k8s/rclone-setup.shkubectl apply -f k8s/db-backup.yaml- Test:
kubectl create job --from=cronjob/db-backup db-backup-test -n usingandunderstandingai
Will this deploy if I run deploy:dev?
No. deploy:dev runs scripts/deploy.ts, which only builds/pushes the web, api, and mail images and restarts their deployments. It never runs kubectl apply on any manifests.
The db-backup CronJob is only applied by k8s/deploy.sh (bun run k8s:deploy), which is the initial cluster setup script — not something you run routinely.
For the first deploy you'd need to do it manually:
bun run db:backup:build # build the image
bun run db:backup:push # push to Docker Hub
./k8s/rclone-setup.sh # copy rclone.conf into PVC
kubectl apply -f k8s/db-backup.yaml # create the PVC + CronJob
After that, the CronJob runs on its own. You'd only need to rebuild/push the image if backup.sh or the Dockerfile changes. Want me to add the db-backup image to scripts/deploy.ts so it gets built and pushed during regular deploys, or is manual-only fine since it changes rarely?
Manual is fine. And so this is cron container run by k8s?
Yes, exactly. It's a K8s CronJob — K8s starts a container from the db-backup image once daily at 2 AM Central, it runs backup.sh, and the container exits. K8s tracks success/failure and keeps the last 3 of each in history.
The container connects directly to the postgres service inside the cluster (no port-forwarding needed), dumps both databases through gzip into rclone rcat to Box, and sends an email via the mail service if anything fails.
Nice! I probably need to rebuild the dev and prod containers, right, to pick up the mail fix?
Yes — the backup mail list in content/mail-lists.ts is baked into the mail image at build time. Until you rebuild and redeploy, the mail service won't recognize [email] and the failure notifications won't route.
Your task
First, run git status, git diff, and git log --oneline -10 to understand the current state.
Then add and commit all modified files with a descriptive message. Break the changes into multiple commits if appropriate. If there are untracked files, ask if they should be added or added to the appropriate .gitignore file. The user may also want to take over at this point, so provide an abort option as well. Always check changes for unnecessary debugging statements or logging that might have been accidentally included. Prompt the user if you find any, and offer to remove each one by one. Do not use interactive git staging (git add -p) as it does not work.
I'll check the current state and prepare commits.
I can see:
Modified files:
- Version bump (2026.2.11 → 2026.2.12) across all packages
content/mail-lists.ts- addedbackupmail listk8s/deploy.sh- added db-backup CronJob deployment steppackage.json- added db:sync, db:backup, db:backup:build, db:backup:push scriptsCLAUDE.md- documented sync/backup functionalitytesting/e2e/tests/meeting-2026-02-17.spec.ts- formatting changes (prettier)
Untracked files:
k8s/db-backup.yaml- CronJob manifestk8s/db-backup/- Docker image (Dockerfile + backup.sh)k8s/rclone-setup.sh- rclone PVC initialization scriptscripts/db-backup.ts- local backup scriptscripts/db-sync.ts- database sync script
All changes look good — no debugging statements or unnecessary logging. The prettier formatting changes in the test file are cosmetic.
Should I:
- Add all untracked files (db-backup K8s resources, backup/sync scripts) and commit everything together
- Skip the scripts (
scripts/db-backup.ts,scripts/db-sync.ts) and only commit the K8s backup implementation - Let you take over and handle the commits manually
- Abort and don't commit anything
1
Committed 5f58eba on db_backups — 34 files, clean working tree.
Your task
First, run git status, git diff, and git log --oneline -10 to understand the current state.
Then add and commit all modified files with a descriptive message. Break the changes into multiple commits if appropriate. If there are untracked files, ask if they should be added or added to the appropriate .gitignore file. The user may also want to take over at this point, so provide an abort option as well. Always check changes for unnecessary debugging statements or logging that might have been accidentally included. Prompt the user if you find any, and offer to remove each one by one. Do not use interactive git staging (git add -p) as it does not work.
I'll check the current state.
The only changes are version bumps from 2026.2.12 → 2026.2.13 across all package.json files. No debugging statements or unnecessary logging. This is a clean version bump.
✅ Committed 20c49fd — version bump to 2026.2.13, clean working tree.
I'm wondering if we can create an /instructors page? This would be something I could send to other instructors that would highlight the great and very interesting work we're doing. It should probably provide an overview of the syllabus (probably at the bottom), a summary of the course, links to sample activities and discussion of the activity framework, and examples or links to some of the more interesting bits like conversational assessment and discussion.
Let me explore the codebase to understand the existing page patterns, content structure, and what materials are available to showcase.
This is a completely different task. Let me explore the codebase to understand pages, routing, content structure, and what interesting content exists to showcase.
Let me read a few key files to finalize my understanding before designing the plan.
Now let me check a couple more things — the assessments public mode and blog post titles.
Good — assessments support a ?public=true mode for demos. Now let me check what meeting pages look like publicly and if there are any specific interesting meetings to link.
Now I have a thorough understanding. Let me ask a couple of clarifying questions before writing the plan.
Good. Now let me write the plan.