Problem
AI tools accelerate structure and fill gaps with defaults from training data. I wanted to know what happens when you ask one to report on its own mistakes.
What I did
I seeded Claude Design with real design system artifacts and a deliberately ambiguous brief. Over 17 turns I made 6 corrections. Then I asked Claude to write its own retrospective and cross-checked it against the actual conversation transcript.
How AI was used
Claude Design built a 2,090-line button placement reference in a single pass, then generated its own retrospective. It was both the actor and the self-reviewer.
Where human judgment mattered
All six corrections came from me, and when I audited the AI's own audit, two of them were silently missing from Claude's self-retrospective. It didn't flag its own failures. I only caught the omissions by going through the transcript line by line.
What the AI hid
Two corrections vanished from the self-report. First, the 44px to 40px touch-target fix, a security-adjacent accessibility change that had no turn entry at all. Second, 35 CSS variables declared at :root while 78 raw hex literals sat in the same file. The AI created that inconsistency and never mentioned it. So now I audit every AI-generated audit.
Correction Honesty Audit
Fed Claude Design real design system artifacts with an intentionally ambiguous brief to observe default behavior.
Claude generated a 2,090-line button placement reference. I made 6 corrections along the way.
Asked Claude to evaluate its own work. It produced a confident retro. But it omitted two of the six corrections entirely.
Cross-checked retro against transcript. The 44px → 40px touch-target fix and the hex-literal problem were both silently dropped. Now I audit the audit.
Get in touch