Skip to content

The Queue That Deleted Itself

Eight workflows queued. Seven deleted themselves before execution. Zero wasted builds.

The Waste Pattern

The static build workflow pulled all content from shared storage, rendered output, and published the result. It ran every time someone pushed to git. It also ran on schedule. And manually.

Problem: Only one build could run at a time (mutex lock). When commits came in bursts, builds queued. Five commits in 20 minutes meant five queued workflows.

All five would eventually execute. All five would process identical content. Four were wasteful.

Time wasted per burst: 12 minutes Resources wasted: 70% of CPU/memory Developer frustration: High


The Pattern: Pre-Execution Cleanup

The solution runs before the main workflow step:

  1. Acquire mutex lock
  2. Delete all OTHER pending workflows
  3. Execute main build

Work Avoidance, Not Idempotency

This isn't about making reruns safe. It's about preventing unnecessary runs entirely. The workflow is already idempotent. We're optimizing resource usage, not correctness.

flowchart TD
    A[Workflow Starts] --> B{Acquired Mutex?}
    B -->|Yes| C[List Pending Workflows]
    B -->|No| D[Wait in Queue]
    C --> E[Identify Current Workflow]
    E --> F[Delete All OTHER Pending]
    F --> G[Execute Main Build]
    G --> H[Release Mutex]

    %% Ghostty Hardcore Theme
    style A fill:#65d9ef,color:#1b1d1e
    style C fill:#fd971e,color:#1b1d1e
    style F fill:#f92572,color:#f8f8f3
    style G fill:#a7e22e,color:#1b1d1e

Implementation

Workflow Structure

spec:
  serviceAccountName: workflow-sa
  activeDeadlineSeconds: 1200   # 20 minute timeout
  entrypoint: main
  synchronization:
    mutexes:
      - name: build-lock
  templates:
    - name: main
      steps:
        - - name: cleanup-pending-workflows
            template: cleanup-pending-workflows
        - - name: build-main
            template: build-main

The cleanup step runs first. If it fails, the main build doesn't start.

Cleanup Logic

#!/bin/bash

# Get current workflow name
CURRENT_WORKFLOW="{{workflow.name}}"

# List all pending workflows matching our pattern
PENDING_WORKFLOWS=$(kubectl get workflows -n production -o json | \
  jq -r '.items[] |
    select(.metadata.name | startswith("build-workflow-")) |
    select(.status.phase == "Pending") |
    .metadata.name' | \
  sort)

# Delete all pending workflows except current one
DELETED=0
for wf in ${PENDING_WORKFLOWS}; do
  if [ "${wf}" != "${CURRENT_WORKFLOW}" ]; then
    kubectl delete workflow ${wf} -n production
    DELETED=$((DELETED + 1))
  fi
done

echo "Deleted ${DELETED} intermediate workflows"

Key design decisions:

  • kubectl, not argo CLI: Works with distroless base images
  • Self-aware: Never deletes the current workflow
  • Scoped by prefix: Only deletes workflows matching the pattern
  • Status check: Only deletes Pending workflows (not Running or Succeeded)

RBAC Required

The ServiceAccount needs delete permission on workflows:

- apiGroups: [argoproj.io]
  resources: [workflows]
  verbs: [delete, list, get]

Production Validation

Test Scenario

Submitted 5 test workflows while 3 existing workflows were already pending. Total: 8 pending workflows.

Cleanup Execution

2025-12-15T21:23:48+0000 START cleanup-pending-workflows
Current workflow: build-workflow-abc123
Pending workflows found:
build-workflow-def456
build-workflow-ghi789
build-workflow-jkl012
build-workflow-mno345
build-workflow-pqr678
build-workflow-stu901
build-workflow-vwx234
Total pending workflows: 7
Deleting intermediate workflow: build-workflow-def456
workflow.argoproj.io "build-workflow-def456" deleted
Deleting intermediate workflow: build-workflow-ghi789
workflow.argoproj.io "build-workflow-ghi789" deleted
[... 5 more deletions ...]
Deleted 7 intermediate workflows
2025-12-15T21:23:50+0000 DONE cleanup-pending-workflows

Result: 7 workflows deleted in 2 seconds. Only the running workflow remained.

Metrics

Metric Before After Improvement
Average queue depth 8 workflows 0-1 workflows 87% reduction
Average wait time 16 minutes 0-2 minutes 87% reduction
Wasteful builds per burst 6-7 0 100% elimination
Resource waste ~70% 0% 100% elimination
Cleanup overhead N/A 2 seconds Negligible

When to Use This Pattern

This pattern works for workflows that are:

Good Fit

  • Idempotent: Produces same result regardless of run count
  • Source-pulling: Fetches all data (not incremental)
  • Mutex-locked: Only one instance runs at a time
  • Frequently triggered: Multiple triggers in short windows
  • Resource intensive: Wasteful to run unnecessarily

Example Use Cases

Workflow Type Why It Fits Expected Savings
Static content builds Pulls all source from storage 70-90% queue reduction
Site generation Rebuilds entire site from source 60-80% queue reduction
Full database backups Dumps entire database 50-70% queue reduction
Container image builds Builds from Dockerfile + context 40-60% queue reduction
Deployment sync operations Syncs to latest desired state 60-80% queue reduction

Anti-Patterns (Do NOT Use)

Incremental workflows: Where each run processes unique data ❌ Stateful workflows: Where execution order matters ❌ Transactional workflows: Where each trigger represents discrete work ❌ Parallel workflows: Where multiple instances should run simultaneously


Implementation Journey

Issue 1: The Distroless Problem

Attempt: Used argoproj/argocli:latest for the cleanup step.

Error:

failed to find name in PATH: exec: "/bin/bash": stat /bin/bash:
no such file or directory

Root cause: argocli is a distroless image. No shell, no bash, no kubectl.

Fix: Switched to bash-utils:latest which includes bash, kubectl, and jq.

Issue 2: CLI Tool Selection

Initial approach: Use argo CLI to list and delete workflows.

Problem: Requires mounting argo CLI binary, extra configuration, complex authentication.

Solution: Use kubectl. Already available. Works directly with Kubernetes API. Simpler.

Issue 3: Self-Deletion Prevention

Risk: Workflow deletes itself before the main build runs.

Mitigation: Check current workflow name against each pending workflow:

if [ "${wf}" != "${CURRENT_WORKFLOW}" ]; then
  kubectl delete workflow ${wf}
fi

The {{workflow.name}} template variable expands at runtime. The current workflow always skips itself.


The Pattern Library

This "work avoidance" pattern is documented in the operator manual: Queue Cleanup.

The technique page includes complete operational guidance:

  • Monitoring procedures and real-time log access
  • Metrics to track with alert thresholds
  • Troubleshooting common issues (permissions, prefix matching, self-deletion)
  • Reusable Helm chart template for adoption
  • Production validation data

For more work avoidance techniques, see Work Avoidance Overview.


Conclusion

The self-cleaning queue pattern eliminates wasteful execution for idempotent, mutex-locked workflows.

Results:

  • Zero wasteful builds
  • 87% reduction in queue wait times
  • 70% reduction in resource waste
  • 2-second cleanup overhead
  • Production validated with 8 pending workflows

Implementation:

  • 50 lines of bash script
  • Simple kubectl commands
  • One RBAC permission addition
  • Works with any Argo Workflows deployment

The pattern is reusable. The metrics are clear. The waste is gone.

If your workflows queue behind mutex locks and process identical data, you have a waste problem. Delete the queue before it executes.

Comments