Backpressure Handling¶

When events arrive faster than they can be processed, systems must apply backpressure by slowing or rejecting new events to prevent overload. Without backpressure, queues grow unbounded, memory exhausts, and systems crash.

Backpressure Points¶

Multiple components can become bottlenecks:

flowchart LR
    A[Event Storm] --> B[EventSource]
    B -->|Backpressure| C[EventBus]
    C -->|Backpressure| D[Sensor]
    D -->|Backpressure| E[Workflows]

    %% Ghostty Hardcore Theme
    style A fill:#f92572,color:#1b1d1e
    style B fill:#fd971e,color:#1b1d1e
    style C fill:#515354,color:#f8f8f3
    style D fill:#f92572,color:#1b1d1e
    style E fill:#a7e22e,color:#1b1d1e

Each component needs its own backpressure configuration.

EventBus Limits¶

Configure JetStream to limit queue depth:

apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
  name: default
spec:
  jetstream:
    version: "2.9.11"
    settings: |
      max_msgs: 100000
      max_bytes: 1073741824
      max_msg_size: 1048576
      max_consumers: 100

Setting	Purpose	Behavior When Exceeded
`max_msgs`	Maximum messages in stream	New messages rejected
`max_bytes`	Maximum total bytes	New messages rejected
`max_msg_size`	Maximum single message size	Message rejected
`max_consumers`	Maximum concurrent consumers	New consumers rejected

Sensor Rate Limiting¶

Limit how fast a Sensor processes events:

apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: rate-limited
spec:
  # Sensor-level rate limiting
  rateLimit:
    requestsPerUnit: 10
    unit: minute

  dependencies:
    - name: high-volume-event
      eventSourceName: source
      eventName: event

  triggers:
    - template:
        name: process
        argoWorkflow:
          # ...

This limits the Sensor to 10 workflow submissions per minute, regardless of how many events arrive.

Trigger Rate Limiting¶

Apply rate limits to individual triggers:

triggers:
  - template:
      name: api-call
      http:
        url: https://api.example.com/webhook
        method: POST
    rateLimit:
      requestsPerUnit: 100
      unit: second

Different triggers can have different limits based on downstream capacity.

Workflow Concurrency¶

Use Argo Workflows concurrency controls as a final backpressure mechanism:

# Semaphore limits concurrent workflows
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-semaphores
data:
  api-calls: "5"

# Workflow references semaphore
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: api-consumer
spec:
  synchronization:
    semaphore:
      configMapKeyRef:
        name: workflow-semaphores
        key: api-calls
  templates:
    - name: main
      # ...

New workflows wait when semaphore is full. See Semaphore Patterns for details.

Handling Rejected Events¶

When backpressure rejects events, you have options:

1. Drop Silently¶

Acceptable for truly optional events (metrics, telemetry):

# No special handling - events simply don't arrive

2. Return Error to Source¶

Let the event producer retry:

# EventSource configured to not acknowledge
spec:
  pubsub:
    event:
      ackWaitTime: "30s"  # Time to process before nack

3. Store for Later¶

Write to persistent storage for batch processing:

triggers:
  - template:
      name: store-for-later
      k8s:
        operation: create
        source:
          resource:
            apiVersion: v1
            kind: ConfigMap
            metadata:
              generateName: backlog-
            data:
              event: ""

Monitoring Backpressure¶

Watch these metrics for backpressure issues:

Metric	Warning Sign
EventBus queue depth	Consistently near max_msgs
Sensor processing latency	Increasing over time
Workflow pending count	Growing queue of waiting workflows
Event age	Events sitting in queue for minutes

# Check EventBus queue depth (JetStream)
kubectl exec -n argo-events eventbus-default-0 -- \
  nats stream info default --json | jq '.state.messages'

# Check pending workflows
kubectl get workflows -n argo-workflows --field-selector status.phase=Pending | wc -l

Design for Peak Load

Size your backpressure limits for peak load, not average. A system that handles normal traffic but collapses under spikes isn't production-ready. Test with realistic load patterns.

Retry Strategies - Handle transient failures
Semaphore Patterns - Workflow concurrency
High Availability - Scale for throughput