Mutex Synchronization¶
A mutex (mutual exclusion) ensures only one workflow instance can run at a time. When a workflow acquires a mutex, other workflows requesting the same mutex wait in a queue. When the first workflow completes, the next in queue proceeds.
Why Mutexes?¶
Some operations must be atomic. A database migration can't run twice simultaneously. A deployment to production must complete before another starts. A cache rebuild mustn't overlap with itself.
Without mutexes, you rely on timing. If events arrive slowly enough, workflows don't overlap. But under load, multiple workflows trigger at once: during a deployment storm, after a network partition resolves, or when a backlog drains. The system that worked fine in testing fails in production.
Mutexes make the single-execution guarantee explicit. The system behaves the same whether one event arrives or a hundred arrive simultaneously.
Basic Configuration¶
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: build-pipeline
spec:
synchronization:
mutexes:
- name: build-lock
entrypoint: main
templates:
- name: main
# ... workflow steps
The synchronization.mutexes field declares which mutexes this workflow needs. When the workflow starts, it attempts to acquire the lock. If another workflow holds it, this one waits.
How Mutex Queuing Works¶
sequenceDiagram
%% Ghostty Hardcore Theme
participant A as Workflow A
participant M as Mutex: build-lock
participant B as Workflow B
participant C as Workflow C
A->>M: Acquire lock
M-->>A: Lock granted
Note over A: Running...
B->>M: Acquire lock
M-->>B: Queued (position 1)
C->>M: Acquire lock
M-->>C: Queued (position 2)
A->>M: Release lock
M-->>B: Lock granted
Note over B: Running...
B->>M: Release lock
M-->>C: Lock granted
Note over C: Running...
Workflows execute in FIFO order. The first to request the lock gets it. Others queue up. No workflow starves indefinitely. Eventually it reaches the front of the queue.
Dynamic Mutex Names¶
Mutex names can include workflow parameters, enabling per-resource locks:
This creates separate locks for each environment. Deployments to staging and production can run simultaneously. Two deployments to production cannot.
Common patterns:
| Scenario | Mutex Pattern |
|---|---|
| Single build at a time | mutexes: [{name: build-lock}] |
| Per-environment locks | mutexes: [{name: "deploy-{{workflow.parameters.env}}"}] |
| Per-repository locks | mutexes: [{name: "build-{{workflow.parameters.repo}}"}] |
| Shared resource access | mutexes: [{name: database-migration}] |
Multiple Mutexes¶
Workflows can require multiple locks:
The workflow acquires all locks before starting. This prevents deadlocks where workflow A holds lock 1 and waits for lock 2, while workflow B holds lock 2 and waits for lock 1.
Deadlock Prevention
Argo acquires all mutexes atomically. If any lock isn't available, the workflow waits for all of them. This prevents deadlocks but means a workflow waiting for multiple locks might wait longer than expected.
Mutex Debugging¶
When workflows seem stuck, check mutex state:
# List workflows waiting on mutexes
kubectl get workflows -l workflows.argoproj.io/sync-id
# Check a specific workflow's sync status
kubectl get workflow <name> -o jsonpath='{.status.synchronization}'
Common issues:
| Symptom | Cause | Fix |
|---|---|---|
| Workflow stuck in Pending | Waiting for mutex | Check which workflow holds the lock |
| Workflows never start | Mutex held by failed workflow | Terminate the failed workflow |
| Inconsistent mutex names | Parameter typo | Verify parameter values match |
A failed workflow still holds its mutex until terminated. If a workflow fails and isn't cleaned up, subsequent workflows wait forever. Use TTL strategies to automatically clean up failed workflows.
Mutex vs Concurrency Policy¶
CronWorkflows have a separate concurrencyPolicy field:
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
spec:
concurrencyPolicy: Replace # or Forbid, Allow
schedule: "0 * * * *"
workflowSpec:
synchronization:
mutexes:
- name: hourly-job
The difference:
concurrencyPolicycontrols what happens when a scheduled run triggers while a previous run is activemutexescontrol access to shared resources across any workflows (not just the same CronWorkflow)
Use both when a CronWorkflow should also coordinate with event-triggered workflows:
concurrencyPolicy: Forbidprevents overlapping scheduled runsmutexesprevents overlap with event-triggered runs of related workflows
Related¶
- Semaphores - Limited concurrent access
- TTL Strategy - Cleanup to prevent mutex deadlocks
- Scheduled Workflows - CronWorkflow concurrency