What happens when your AI agent makes a mistake it can't catch

My AI system downloaded 2 portraits of white men from the Library of Congress and put them in a VR experience about Black liberation.

I didn't catch it because the metadata was right. Names matched. Dates matched. The search query was correct. The images were wrong.

No agent flagged it. Not the Cultural Archivist that sourced them. Not the Cultural Validator that reviews for integrity. Not the Technical Mentor that runs visual QA. The system has 8 specialized agents. None of them have eyes.

I caught it because I opened the files.

This is a story about a system called Spark — 8 AI agents that run the operations of a cultural heritage technology collective. Spark manages grants, curriculum, cultural validation, portfolio tracking, and impact reporting. It handles the work of what would otherwise be a 5-person team. It costs $20/month.

And it downloaded the wrong portraits.

What changed after this failure:

Visual QA is now a protocol. After any image sourcing, the Technical Mentor takes screenshots and reviews them before images enter the build. The system asks: "Want me to take screenshots and check what's rendering?"

But the deeper lesson is this: human oversight is not a limitation — it's the design.

The system's job is to make human review efficient, not to replace it. Spark drafts emails but never sends them. The Cultural Validator reviews content but never approves it in the same pass (the Anti-Rubber-Stamp Rule). State files are updated by agents but reviewed by humans.

Every guardrail in the system exists because something went wrong and someone was paying attention.

What is Spark?

Spark is a multi-agent AI system built on Claude Code. 8 agents share the same state files (markdown, version-controlled, human-readable), maintain memory across sessions, and operate under a governance model that includes:

Risk levels — some actions auto-proceed, some need human sign-off
Calibration — conservative agents (validation) behave differently from aggressive agents (fundraising)
The Anti-Rubber-Stamp Rule — QA agents cannot approve something the same turn they first encounter it
Confidence scoring — every output comes with a score and what would increase it

The whole thing runs in a git repo. No database. No deployment pipeline. No vendor lock-in. The agent specs are markdown files a non-engineer can edit.

It's open source: github.com/ShdwSpde/spark-workshop-template

The workshop

I'm running a hands-on workshop where you build your own Spark in 3 hours. Design your agents, populate your state files with your real organizational data, and run your first session.

You leave with a working system. Not a slide deck about one.

First date coming soon, May 9th. Sign up on Luma or send us an email if you hit the waitlist: [email protected]

Spark Dispatch is a series about building and running a multi-agent AI system for a real organization. The wins, the failures, and the rules that came from both.

— Radical Imagination

What happens when your AI agent makes a mistake it can't catch

Keep reading

Radical Imagination