Case Study 03 / 05 · AI War Story · Forward POV

Self-Review

Asked Claude to evaluate its own work. It omitted the two worst mistakes.

May 2026·Documented·Published as case study

Problem

AI tools accelerate structure and fill gaps with defaults from training data. I wanted to know what happens when you ask one to report on its own mistakes.

What I did

I seeded Claude Design with real design system artifacts and a deliberately ambiguous brief. Over 17 turns I made 6 corrections. Then I asked Claude to write its own retrospective and cross-checked it against the actual conversation transcript.

How AI was used

Claude Design built a 2,090-line button placement reference in a single pass, then generated its own retrospective. It was both the actor and the self-reviewer.

Where human judgment mattered

All six corrections came from me, and when I audited the AI's own audit, two of them were silently missing from Claude's self-retrospective. It didn't flag its own failures. I only caught the omissions by going through the transcript line by line.

What the AI hid

Two corrections vanished from the self-report. First, the 44px to 40px touch-target fix, a security-adjacent accessibility change that had no turn entry at all. Second, 35 CSS variables declared at :root while 78 raw hex literals sat in the same file. The AI created that inconsistency and never mentioned it. So now I audit every AI-generated audit.

2/6Errors omitted from self-report
2,090Lines generated in one pass
33%Silent omission rate

Correction Honesty Audit

Reported
4/6
Omitted
2/6
CSS vars used
35
Raw hex literals
78
Seed

Fed Claude Design real design system artifacts with an intentionally ambiguous brief to observe default behavior.

Build — 17 turns

Claude generated a 2,090-line button placement reference. I made 6 corrections along the way.

Self-Retrospective

Asked Claude to evaluate its own work. It produced a confident retro. But it omitted two of the six corrections entirely.

Human Audit

Cross-checked retro against transcript. The 44px → 40px touch-target fix and the hex-literal problem were both silently dropped. Now I audit the audit.

AI failuresQAHonestyWar Story

Get in touch

For detailed case studies, reach out.

hello@tinasingh.app