Environment Progression Operations¶
Operational Patterns
Advanced operational patterns for progressive deployment. These patterns build on the core deployment strategies.
Feature Flags for Progressive Rollout¶
Enable features progressively across environments:
# Dev: all features on
feature_flags:
new_ui: true
beta_api: true
experimental_cache: true
# Staging: beta features on
feature_flags:
new_ui: true
beta_api: true
experimental_cache: false
# Production: stable features only
feature_flags:
new_ui: true
beta_api: false
experimental_cache: false
New feature works in dev and staging before production sees it.
Monitoring Across Environments¶
Track metrics in each environment:
# Prometheus queries
# Dev error rate
rate(http_requests_total{namespace="development",status=~"5.."}[5m])
# Staging error rate
rate(http_requests_total{namespace="staging",status=~"5.."}[5m])
# Production error rate
rate(http_requests_total{namespace="production",status=~"5.."}[5m])
Alert when staging error rate spikes. Fix before production deploy.
The Gatekeeper Pattern¶
Automated gates between environments:
jobs:
deploy-dev:
steps:
- run: deploy to dev
- run: smoke test
quality-gate-dev:
needs: deploy-dev
steps:
- name: Check error rate
run: |
ERROR_RATE=$(curl -s prometheus/query?query=rate | jq '.value')
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "Error rate too high: $ERROR_RATE"
exit 1
fi
- name: Check response time
run: |
P95=$(curl -s prometheus/query?query=p95 | jq '.value')
if (( $(echo "$P95 > 500" | bc -l) )); then
echo "P95 response time too high: $P95ms"
exit 1
fi
deploy-staging:
needs: quality-gate-dev
steps:
- run: deploy to staging
Metrics bad in dev? Staging doesn't deploy.
The Full Progression¶
flowchart TD
A[Code Merge] --> B[Deploy Dev]
B --> C[Smoke Test Dev]
C --> D{Pass?}
D -->|No| E[Alert Team]
D -->|Yes| F[Quality Gates Dev]
F --> G{Metrics OK?}
G -->|No| E
G -->|Yes| H[Deploy Staging]
H --> I[Smoke Test Staging]
I --> J{Pass?}
J -->|No| E
J -->|Yes| K[Load Test Staging]
K --> L{Pass?}
L -->|No| E
L -->|Yes| M[Quality Gates Staging]
M --> N{Metrics OK?}
N -->|No| E
N -->|Yes| O[Approval Required]
O --> P[Deploy Production]
P --> Q[Smoke Test Production]
Q --> R{Pass?}
R -->|No| S[Auto Rollback]
R -->|Yes| T[Success]
%% Ghostty Hardcore Theme
style A fill:#65d9ef,color:#1b1d1e
style E fill:#f92572,color:#1b1d1e
style S fill:#f92572,color:#1b1d1e
style T fill:#a7e22e,color:#1b1d1e
Common Pitfalls¶
Pitfall 1: Skipping Environments¶
"Just this once, deploy directly to prod." Never ends well.
Pitfall 2: Staging Too Different from Production¶
Staging with 1 pod doesn't catch production race conditions at 50 pods.
Pitfall 3: No Rollback Plan¶
If you can't rollback in 60 seconds, your deployment strategy is incomplete.
Pitfall 4: Manual Smoke Tests¶
Automated smoke tests or they don't happen consistently.
Implementation Checklist¶
Building environment progression:
- Define environments: Dev, staging, production at minimum
- Namespace isolation: NetworkPolicies enforce boundaries
- Configuration management: Kustomize overlays or Helm values
- Smoke tests: Automated validation at each stage
- Quality gates: Metrics thresholds between environments
- Load testing: Staging simulates production load
- Rollback automation: Auto-rollback on smoke test failure
- Feature flags: Progressive enablement across environments
- Monitoring: Per-environment dashboards
- Approval gates: Manual approval before production
Related Patterns¶
Environment progression fits into SDLC hardening:
- SDLC Hardening: Build security into pipelines
- Zero-Vulnerability Pipelines: Scan before each environment
- Policy-as-Code with Kyverno: Enforce standards per environment
The deployment that exploded in dev never reached staging. The migration that locked tables in staging never touched production. The config that worked in dev got caught in smoke tests. Zero production incidents from environmental differences.