Related lectures: Lecture 09. Stop agents from declaring victory early · Lecture 10. Only a full-pipeline run counts as real verification Template files: templates/

Project 05. Make the Agent Verify Its Own Work

What You Do

Implement role separation — a generator that implements, an evaluator that reviews, and optionally a planner. Run three times to measure the effect of each added role.

Choose a substantive feature upgrade (multi-turn conversation, citation panel redesign, or document filtering) and keep it consistent across all runs.

Tools

Claude Code or Codex
Git
Node.js + Electron

Harness Mechanism

Self-verification + grounded Q&A + evidence-based completion

Use the Checked-In Project

Repository path: projects/project-05/

Directory	What it contains	What to compare
`starter/`	Project 04-based app before the conversation-history upgrade.	Starting point if you want to rerun the three variants yourself.
`solution/single-role/`	One agent plans, implements, and self-reviews.	`evaluator-rubric.md` score 1.6/5 and listed defects.
`solution/gen-eval/`	Generator plus evaluator with revision evidence.	`evaluator-rubric.md` score 3.3/5 and revision notes.
`solution/plan-gen-eval/`	Planner plus generator plus evaluator.	`sprint-contract.md`, `evaluator-rubric.md` score 4.9/5.

The checked-in feature is multi-turn Q&A conversation history. Keep that feature constant across all three variants so the only variable is role separation.

Project 05. Make the Agent Verify Its Own Work ​

What You Do ​

Tools ​

Harness Mechanism ​

Use the Checked-In Project ​

Project 05. Make the Agent Verify Its Own Work

What You Do

Tools

Harness Mechanism

Use the Checked-In Project