Idempotent Automation: Why Reruns Shouldn't Scare You¶

Your workflow failed at step 47 of 50. Do you fix the issue and rerun from the beginning, or do you manually complete the remaining steps?

The Nervousness Test

If that question makes you nervous, your automation isn't idempotent. And that's a problem.

This post shares the journey to making reruns boring. For the full technical deep-dive, see the Idempotency Pattern Guide.

The Scenario That Started It All¶

It was a Friday afternoon. A file distribution workflow syncing CONTRIBUTING.md to 40 repositories had failed at repository 37. Rate limiting.

Three options presented themselves:

Rerun from the beginning - But would it create duplicate PRs for the 36 repos that succeeded?
Manually complete the remaining 4 - Doable, but tedious
Wait until Monday - The coward's choice

I chose option 1. And watched in horror as the workflow created 36 duplicate PRs.

The Learning Moment

That afternoon taught me more about idempotency than any documentation ever could.

The Fix Was Embarrassingly Simple¶

One conditional check before creating PRs:

EXISTING=$(gh pr list --head "$BRANCH" --json number --jq 'length')
if [ "$EXISTING" -eq 0 ]; then
  gh pr create --title "Automated update" --body "From central repo"
else
  echo "PR already exists, skipping creation"
fi

That's it. Check if it exists before creating it.

But that single fix led to a rabbit hole. What about the branch creation? The commits? The push? Each operation needed its own idempotency guard.

The Rabbit Hole¶

The more I looked, the more I found:

Branch operations needed force-reset to handle diverged state
Change detection was using git diff which doesn't see untracked files
Commits failed with "nothing to commit" on clean reruns
Pushes needed --force-with-lease to handle rebased branches

Each fix was simple. The challenge was finding them all.

The Full Breakdown

See Implementation Patterns for the five patterns that emerged from this work.

The Payoff¶

Three weeks later, the same workflow failed again. Different repository, different error. Network timeout.

This time, I clicked "Re-run jobs" and went to lunch.

When I came back, everything was green. The 39 successful repos had detected "no changes needed" and skipped. The failed repo had retried and succeeded.

No duplicates. No manual cleanup. No fear.

When It's Not Worth It¶

Not every workflow needs this treatment. A one-off migration script? Just run it carefully. A local development tool? Optimize for speed, not resilience.

The Decision Matrix helps calibrate where to invest. The short version:

High failure risk + High recovery cost = Full idempotency
Low failure risk + Low recovery cost = Don't bother

The Cliffhanger¶

There's one thing I haven't solved yet: caches.

What happens when your idempotent workflow depends on a cache that expired? Or a cache key that changed? The workflow that "worked fine on rerun" might fail spectacularly when the cache misses.

The Cache Test

If you deleted all caches and reran your workflow, would it still produce the same result?

That's the next frontier. For now, see Cache Considerations for the traps I've identified.

Start Here¶

If you're dealing with flaky reruns:

Read Pros and Cons to understand the tradeoffs
Score your workflow with the Decision Matrix
Apply the relevant Implementation Patterns
Test it by running twice

Your workflow failed at step 47. You fixed the bug, clicked rerun, and went to lunch. When you came back, everything was green. That's the power of idempotent automation.