Related lectures: Lecture 09. Stop agents from declaring victory early · Lecture 10. Only a full-pipeline run counts as real verification Template files: templates/
Project 05. Make the Agent Verify Its Own Work
What You Do
Implement role separation — a generator that implements, an evaluator that reviews, and optionally a planner. Run three times to measure the effect of each added role.
Choose a substantive feature upgrade (multi-turn conversation, citation panel redesign, or document filtering) and keep it consistent across all runs.
Tools
- Claude Code or Codex
- Git
- Node.js + Electron
Harness Mechanism
Self-verification + grounded Q&A + evidence-based completion
Use the Checked-In Project
Repository path: projects/project-05/
| Directory | What it contains | What to compare |
|---|---|---|
starter/ | Project 04-based app before the conversation-history upgrade. | Starting point if you want to rerun the three variants yourself. |
solution/single-role/ | One agent plans, implements, and self-reviews. | evaluator-rubric.md score 1.6/5 and listed defects. |
solution/gen-eval/ | Generator plus evaluator with revision evidence. | evaluator-rubric.md score 3.3/5 and revision notes. |
solution/plan-gen-eval/ | Planner plus generator plus evaluator. | sprint-contract.md, evaluator-rubric.md score 4.9/5. |
The checked-in feature is multi-turn Q&A conversation history. Keep that feature constant across all three variants so the only variable is role separation.