The Queue That Deleted Itself¶
Eight workflows queued. Seven deleted themselves before execution. Zero wasted builds.
The Waste Pattern¶
The static build workflow pulled all content from shared storage, rendered output, and published the result. It ran every time someone pushed to git. It also ran on schedule. And manually.
Problem: Only one build could run at a time (mutex lock). When commits came in bursts, builds queued. Five commits in 20 minutes meant five queued workflows.
All five would eventually execute. All five would process identical content. Four were wasteful.
Time wasted per burst: 12 minutes Resources wasted: 70% of CPU/memory Developer frustration: High
The Pattern: Pre-Execution Cleanup¶
The solution runs before the main workflow step:
- Acquire mutex lock
- Delete all OTHER pending workflows
- Execute main build
Work Avoidance, Not Idempotency
This isn't about making reruns safe. It's about preventing unnecessary runs entirely. The workflow is already idempotent. We're optimizing resource usage, not correctness.
flowchart TD
A[Workflow Starts] --> B{Acquired Mutex?}
B -->|Yes| C[List Pending Workflows]
B -->|No| D[Wait in Queue]
C --> E[Identify Current Workflow]
E --> F[Delete All OTHER Pending]
F --> G[Execute Main Build]
G --> H[Release Mutex]
%% Ghostty Hardcore Theme
style A fill:#65d9ef,color:#1b1d1e
style C fill:#fd971e,color:#1b1d1e
style F fill:#f92572,color:#f8f8f3
style G fill:#a7e22e,color:#1b1d1e
Implementation¶
Workflow Structure¶
spec:
serviceAccountName: workflow-sa
activeDeadlineSeconds: 1200 # 20 minute timeout
entrypoint: main
synchronization:
mutexes:
- name: build-lock
templates:
- name: main
steps:
- - name: cleanup-pending-workflows
template: cleanup-pending-workflows
- - name: build-main
template: build-main
The cleanup step runs first. If it fails, the main build doesn't start.
Cleanup Logic¶
#!/bin/bash
# Get current workflow name
CURRENT_WORKFLOW="{{workflow.name}}"
# List all pending workflows matching our pattern
PENDING_WORKFLOWS=$(kubectl get workflows -n production -o json | \
jq -r '.items[] |
select(.metadata.name | startswith("build-workflow-")) |
select(.status.phase == "Pending") |
.metadata.name' | \
sort)
# Delete all pending workflows except current one
DELETED=0
for wf in ${PENDING_WORKFLOWS}; do
if [ "${wf}" != "${CURRENT_WORKFLOW}" ]; then
kubectl delete workflow ${wf} -n production
DELETED=$((DELETED + 1))
fi
done
echo "Deleted ${DELETED} intermediate workflows"
Key design decisions:
- kubectl, not argo CLI: Works with distroless base images
- Self-aware: Never deletes the current workflow
- Scoped by prefix: Only deletes workflows matching the pattern
- Status check: Only deletes Pending workflows (not Running or Succeeded)
RBAC Required
The ServiceAccount needs delete permission on workflows:
Production Validation¶
Test Scenario¶
Submitted 5 test workflows while 3 existing workflows were already pending. Total: 8 pending workflows.
Cleanup Execution¶
2025-12-15T21:23:48+0000 START cleanup-pending-workflows
Current workflow: build-workflow-abc123
Pending workflows found:
build-workflow-def456
build-workflow-ghi789
build-workflow-jkl012
build-workflow-mno345
build-workflow-pqr678
build-workflow-stu901
build-workflow-vwx234
Total pending workflows: 7
Deleting intermediate workflow: build-workflow-def456
workflow.argoproj.io "build-workflow-def456" deleted
Deleting intermediate workflow: build-workflow-ghi789
workflow.argoproj.io "build-workflow-ghi789" deleted
[... 5 more deletions ...]
Deleted 7 intermediate workflows
2025-12-15T21:23:50+0000 DONE cleanup-pending-workflows
Result: 7 workflows deleted in 2 seconds. Only the running workflow remained.
Metrics¶
| Metric | Before | After | Improvement |
|---|---|---|---|
| Average queue depth | 8 workflows | 0-1 workflows | 87% reduction |
| Average wait time | 16 minutes | 0-2 minutes | 87% reduction |
| Wasteful builds per burst | 6-7 | 0 | 100% elimination |
| Resource waste | ~70% | 0% | 100% elimination |
| Cleanup overhead | N/A | 2 seconds | Negligible |
When to Use This Pattern¶
This pattern works for workflows that are:
Good Fit
- Idempotent: Produces same result regardless of run count
- Source-pulling: Fetches all data (not incremental)
- Mutex-locked: Only one instance runs at a time
- Frequently triggered: Multiple triggers in short windows
- Resource intensive: Wasteful to run unnecessarily
Example Use Cases¶
| Workflow Type | Why It Fits | Expected Savings |
|---|---|---|
| Static content builds | Pulls all source from storage | 70-90% queue reduction |
| Site generation | Rebuilds entire site from source | 60-80% queue reduction |
| Full database backups | Dumps entire database | 50-70% queue reduction |
| Container image builds | Builds from Dockerfile + context | 40-60% queue reduction |
| Deployment sync operations | Syncs to latest desired state | 60-80% queue reduction |
Anti-Patterns (Do NOT Use)
❌ Incremental workflows: Where each run processes unique data ❌ Stateful workflows: Where execution order matters ❌ Transactional workflows: Where each trigger represents discrete work ❌ Parallel workflows: Where multiple instances should run simultaneously
Implementation Journey¶
Issue 1: The Distroless Problem¶
Attempt: Used argoproj/argocli:latest for the cleanup step.
Error:
Root cause: argocli is a distroless image. No shell, no bash, no kubectl.
Fix: Switched to bash-utils:latest which includes bash, kubectl, and jq.
Issue 2: CLI Tool Selection¶
Initial approach: Use argo CLI to list and delete workflows.
Problem: Requires mounting argo CLI binary, extra configuration, complex authentication.
Solution: Use kubectl. Already available. Works directly with Kubernetes API. Simpler.
Issue 3: Self-Deletion Prevention¶
Risk: Workflow deletes itself before the main build runs.
Mitigation: Check current workflow name against each pending workflow:
The {{workflow.name}} template variable expands at runtime. The current workflow always skips itself.
The Pattern Library¶
This "work avoidance" pattern is documented in the operator manual: Queue Cleanup.
The technique page includes complete operational guidance:
- Monitoring procedures and real-time log access
- Metrics to track with alert thresholds
- Troubleshooting common issues (permissions, prefix matching, self-deletion)
- Reusable Helm chart template for adoption
- Production validation data
For more work avoidance techniques, see Work Avoidance Overview.
Conclusion¶
The self-cleaning queue pattern eliminates wasteful execution for idempotent, mutex-locked workflows.
Results:
- Zero wasteful builds
- 87% reduction in queue wait times
- 70% reduction in resource waste
- 2-second cleanup overhead
- Production validated with 8 pending workflows
Implementation:
- 50 lines of bash script
- Simple kubectl commands
- One RBAC permission addition
- Works with any Argo Workflows deployment
The pattern is reusable. The metrics are clear. The waste is gone.
If your workflows queue behind mutex locks and process identical data, you have a waste problem. Delete the queue before it executes.