Queue Cleanup¶
Delete pending workflows before execution when only the latest run matters.
Self-Aware Deletion
The running workflow must never delete itself. Check the current workflow name before deleting others.
The Technique¶
When workflows queue behind a mutex lock and process identical data, intermediate workflows are wasteful. Delete all pending workflows except the current one before executing the main operation.
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: idempotent-workflow
spec:
entrypoint: main
synchronization:
mutexes:
- name: workflow-lock
templates:
- name: main
steps:
- - name: cleanup-pending
template: cleanup-pending
- - name: main-work
template: main-work
- name: cleanup-pending
container:
image: bash-utils:latest
command: ["/bin/bash", "-c"]
args:
- |
CURRENT="{{workflow.name}}"
PENDING=$(kubectl get workflows -n argo -o json | \
jq -r '.items[] |
select(.metadata.name | startswith("idempotent-workflow-")) |
select(.status.phase == "Pending") |
.metadata.name')
for wf in ${PENDING}; do
if [ "${wf}" != "${CURRENT}" ]; then
kubectl delete workflow ${wf} -n argo
fi
done
When to Use¶
Queue cleanup applies to workflows that are:
Good Fit
- Idempotent: Same result regardless of run count
- Source-pulling: Fetches all data (not incremental)
- Mutex-locked: Only one instance runs at a time
- Frequently triggered: Multiple triggers in short windows
- Resource intensive: Wasteful to run unnecessarily
Example Scenarios¶
| Workflow Type | Why Queue Cleanup Helps | Expected Savings |
|---|---|---|
| Documentation builds | Pulls all markdown from git | 70-90% queue reduction |
| Static site generation | Rebuilds entire site | 60-80% queue reduction |
| Full database backups | Dumps entire database | 50-70% queue reduction |
| Container image builds | Builds from Dockerfile | 40-60% queue reduction |
| Deployment sync | Syncs to latest state | 60-80% queue reduction |
Anti-Patterns (Do NOT Use)
❌ Incremental workflows: Each run processes unique data ❌ Stateful workflows: Execution order matters ❌ Transactional workflows: Each trigger represents discrete work ❌ Parallel workflows: Multiple instances should run simultaneously
Implementation Requirements¶
RBAC Permissions¶
The ServiceAccount must have delete permission on workflows:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: workflow-cleanup
rules:
- apiGroups: [argoproj.io]
resources: [workflows]
verbs: [list, get, delete]
Image Requirements¶
The cleanup container needs:
- bash shell: For script execution
- kubectl CLI: For Kubernetes API access
- jq: For JSON parsing
Use a full image, not distroless. Example: bash-utils:latest or build your own.
Workflow Name Pattern¶
Workflows must follow a consistent naming pattern for cleanup filtering:
Cleanup script matches by prefix:
Safety Mechanisms¶
Self-Deletion Prevention¶
Always check the current workflow name before deleting:
CURRENT="{{workflow.name}}"
for wf in ${PENDING}; do
if [ "${wf}" != "${CURRENT}" ]; then
kubectl delete workflow ${wf}
fi
done
The {{workflow.name}} template variable expands at runtime. The current workflow always skips itself.
Status Filtering¶
Only delete workflows in Pending state:
This avoids deleting:
- Running workflows (might be about to complete)
- Succeeded workflows (historical data)
- Failed workflows (debugging information)
Timeout Protection¶
Set activeDeadlineSeconds to prevent runaway cleanup:
If cleanup hangs (API issues, permission problems), the workflow terminates instead of consuming resources indefinitely.
Operational Patterns¶
Monitoring Cleanup¶
Watch cleanup execution in real time:
kubectl logs -n argo \
-l workflows.argoproj.io/workflow=idempotent-workflow \
-c cleanup-pending \
--tail=50 \
-f
Metrics to Track¶
| Metric | Purpose | Alert Threshold |
|---|---|---|
| Workflows deleted per run | Queue buildup indicator | > 10 |
| Cleanup execution time | API performance | > 10 seconds |
| Queue depth after cleanup | Mutex release check | > 1 |
| Cleanup failure rate | RBAC or API issues | > 5% |
Troubleshooting¶
Symptom: Cleanup step fails with permission denied
Fix: Verify ClusterRole includes delete verb for workflows resource
Symptom: Cleanup deletes workflows from other workflow types
Fix: Make workflow name prefix more specific (static-build- vs build-)
Symptom: Cleanup takes too long
Fix: Add namespace filter to kubectl query to reduce API load
Comparison to Other Techniques¶
| Technique | Question | Best For |
|---|---|---|
| Queue Cleanup | "Should queued work execute?" | Mutex-locked workflows |
| Content Hashing | "Is the content different?" | File comparisons, config sync |
| Cache-Based Skip | "Is output already built?" | Build artifacts, dependencies |
| Existence Checks | "Does it already exist?" | Resource creation (PRs, branches) |
Queue cleanup is unique because it operates on the workflow queue itself, not on the data the workflow processes. Use it in combination with other work avoidance techniques.
Production Validation¶
Test scenario: 8 pending workflows, 1 running workflow
Cleanup execution: 2 seconds
Result: 7 workflows deleted, 0 wasteful builds
Metrics:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Average queue depth | 8 | 0-1 | 87% reduction |
| Average wait time | 16 min | 0-2 min | 87% reduction |
| Wasteful builds | 70% | 0% | 100% elimination |
See The Queue That Deleted Itself for the full implementation story.
Reusable Template¶
Extract as a Helm template for use across workflows:
{{- define "work-avoidance.cleanup-pending" -}}
image: bash-utils:latest
command: ["/bin/bash", "-c"]
args:
- |
CURRENT="{{`{{workflow.name}}`}}"
PENDING=$(kubectl get workflows -n {{ .namespace }} -o json | \
jq -r '.items[] |
select(.metadata.name | startswith("{{ .prefix }}")) |
select(.status.phase == "Pending") |
.metadata.name')
DELETED=0
for wf in ${PENDING}; do
if [ "${wf}" != "${CURRENT}" ]; then
kubectl delete workflow ${wf} -n {{ .namespace }}
DELETED=$((DELETED + 1))
fi
done
echo "Deleted ${DELETED} pending workflows"
{{- end -}}
Usage:
- name: cleanup
container:
{{- include "work-avoidance.cleanup-pending"
(dict "prefix" "my-workflow-" "namespace" .Values.namespace)
| nindent 4 }}
Related¶
- Work Avoidance Overview: Core concepts and other techniques
- Argo Workflows Mutex: Mutex locking patterns
- Idempotency Patterns: Making operations safe to repeat
- The Queue That Deleted Itself: Production implementation story