Skip to content

中文版本 →

Related lectures: Lecture 11. Make the agent's runtime observable · Lecture 12. Clean handoff at the end of every session Template files: templates/

Project 06. Build a Complete Agent Harness (Capstone)

What You Do

This is the capstone project. Assemble everything learned in the first five projects, run a full benchmark, then do a cleanup pass to verify quality is maintainable.

Use a fixed multi-feature task set covering the complete product slice: document import, indexing, citation-based Q&A, runtime observability, and readable restartable repo state. First run with weak harness baseline, then with your strongest harness, then a cleanup and re-run. Finally, do a harness ablation experiment — remove one component at a time and see which ones actually matter.

Tools

  • Claude Code or Codex
  • Git
  • Node.js + Electron
  • Quality document template
  • Evaluator rubric
  • All harness components accumulated from the first five projects

Harness Mechanism

Complete harness: all mechanisms + observability + ablation study

Use the Checked-In Project

Repository path: projects/project-06/

DirectoryWhat it containsWhat to compare
starter/Mostly complete product code with intentionally weak harness surface: basic AGENTS.md, no feature_list.json, no session-handoff.md, no clean-state checklist.Manual weak-harness baseline observations. The starter intentionally does not include benchmark scripts.
solution/Full harness surface: AGENTS.md, CLAUDE.md, feature_list.json, init.sh, session-handoff.md, clean-state-checklist.md, quality/evaluator docs, benchmark and cleanup scripts.Run projects/project-06/solution/scripts/benchmark.sh and projects/project-06/solution/scripts/cleanup-scanner.sh, then compare quality-document evidence.

Unlike earlier projects, the capstone starter is not mostly missing product features. The main gap is the operating harness around the app.