AI

Garver Family - I built a tool to QA my own AI prompts. Then I shipped it.

How I rewrote the prompts in my KDP automation pipeline, dropped my OpenRouter bill, and accidentally launched a SaaS in the process.

Garver Family - I built a tool to QA my own AI prompts. Then I shipped it.

How I rewrote the prompts in my KDP automation pipeline, dropped my OpenRouter bill, and accidentally launched a SaaS in the process.

I have a side project that generates low-content books on autopilot. Discovery agent finds niches, cover-gen draws covers, interior generator writes journals and activity books, packager zips it all up for KDP upload. Lots of LLM calls. Lots of dollars on OpenRouter.

For months I assumed the costs were just "the price of building." Then I started actually reading the prompts I'd written six months earlier.

They were terrible.

The prompts I was paying real money for

Here's a sanitized example from the discovery agent — the part of the pipeline that asks an LLM to generate book niche ideas:

You're an expert in low-content books on Amazon. Give me niche ideas
for journals and activity books. Pick ones that aren't too saturated
but have real demand. Format the output however you want.

That ran every couple of hours. Three things were wrong with it:

  1. No format spec. "Format however you want" → the model picked a different shape every run, breaking my downstream parser ~30% of the time.
  2. No constraints. "Aren't too saturated but have real demand" — what does that even mean numerically? The model was guessing thresholds.
  3. No grounding. I wasn't passing in any data about my existing catalog, so it kept suggesting niches I already had books in.

Each failed run = wasted Sonnet tokens + a failed pipeline step + a manual fix. Probably $40-60 a month bleeding out, plus my time.

I tried fixing it manually

I'd been told the same advice everyone hears: "be specific, give examples, define the format." I knew the rules. I just couldn't bring myself to read 15 prompts in detail and rewrite each one. Felt like janitorial work I'd be doing instead of building features.

So I built a tool.

Enter FixMyPrompt

The premise is dumb-simple: paste a prompt + your goal, get a structured QA report back. Score 0-100. Severity-tagged issues with specific fixes. Three rewritten versions you can pick from — concise, detailed, structured — depending on the use case.

The first prompt I ran through it was the discovery one above. The score came back **18/100**. I laughed.

Here's what it caught:

  • Critical: "Format however you want" is the entire reason your downstream parser is breaking. Specify exact JSON shape with field names + types.
  • Critical: "Real demand" needs a number. State a search-volume or BSR threshold the model can score against.
  • Major: No catalog context. Pass in existing slugs so the model can exclude them.
  • Major: No examples. Give 2-3 sample outputs in the format you actually want.

The "structured" rewrite ran 3x longer than my original (it had a Constraints and Output Schema section), but here's the thing: the actual model output got *shorter*. Because the constraints removed the model's tendency to hedge and over-explain.

The numbers

I ran 12 of my pipeline prompts through it over a weekend. Aggregate before-and-after on a 7-day window:

Metric

Before

After

OpenRouter spend

$107.40

$61.12

Pipeline runs

84

84

Failed downstream parses

26

3

Manual interventions

11

1

About a **43% drop** in OpenRouter spend on the same volume of work. The bigger win was the parser failures going from "almost a third" to "basically never" — that's hours of my weekend back.

Not all of those gains were the prompt rewrite. Some came from switching to cheaper models for the easier tasks once I was confident the prompts wouldn't fall over. But that confidence was the unlock — I trusted the new prompts enough to tier-down the model.

What surprised me

The thing I expected to be the value — "the rewritten prompt" — turned out to be third on my list. The bigger wins were:

  1. The issues with fixes. "Quote 'real demand' from your prompt and replace it with 'BSR < 50,000 in the last 30 days'" is way more useful than just handing me a finished version. I learned things I'll apply to prompts I write next month.
  2. The diff view. Seeing every word that got removed in red and every word added in green makes it impossible to miss what changed.
  3. The score over time. As I tightened, my scores climbed. Felt like a code-coverage gauge for prompt quality.

You probably need it more than I did

If you're building anything that calls an LLM in a loop, you have at least one prompt costing you 30-50% more than it needs to. I know because I had a dozen of them and I'm allegedly the guy who built the QA tool.

FixMyPrompt is live now. Pay-as-you-go credits, no subscription required to start. First few reports are on me — paste a prompt, see what falls out. Worst case you get a sanity check on a prompt you've been wondering about. Best case your next OpenRouter invoice is half what last month's was.

FixMyPrompt is a product of GDM LTD. Nothing in this post is sponsored — I built it.

Comments

No comments yet. Be the first to comment!

Please log in to post a comment.

Comments

No comments yet — be the first.

Leave a comment