Skip to content

Edge Cases and Comparison

Understanding when component replacement does not apply and common pitfalls to avoid.

See Platform Component Replacement for the core pattern overview.

Know When NOT to Use Component Replacement

Component replacement is not a universal solution. If users expect the old API, if schema changes are breaking, or if data cannot be replicated, use a different migration strategy.


Scenario Problem Use Instead
User-facing API changes Users expect old API Traffic routing with gradual rollout
Breaking schema changes Old and new incompatible Blue-green deployment with cutover
Stateful data migration Data can't be replicated Maintenance window with downtime
Cost-prohibitive Can't run both components Phased cutover during low-traffic window

Comparison: Traffic Routing vs Component Replacement

Aspect Traffic Routing Component Replacement
Use Case User-facing APIs, features Infrastructure, backing services
Mechanism Route percentage of traffic Swap component references
Rollback Decrease traffic to 0% Change selector/reference back
Monitoring Compare old vs new metrics Compare old vs new metrics
Complexity Routing layer required Compatibility layer required
User Impact Gradual (some users on old, some on new) Instant (all users on new after swap)
Best For A/B testing, feature flags Database migrations, operator upgrades

Not mutually exclusive: You can use traffic routing for APIs and component replacement for databases in the same migration.


Edge Cases and Gotchas

Gotcha 1: Connection Pooling

Problem: Application maintains connection pool to old database. After swap, pool still points to old instance.

Solution:

# Force connection pool refresh
# Option 1: Restart pods (graceful)
kubectl rollout restart deployment/app

# Option 2: Connection pool TTL
# Configure connection max lifetime < migration window
pool.maxLifetimeSeconds: 300  # 5 minutes

Gotcha 2: DNS Caching

Problem: Service DNS entry cached by application. After swap, app still resolves to old IP.

Solution:

# Short TTL on Service DNS
# Configure in CoreDNS
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
data:
  Corefile: |
    .:53 {
      ttl 30  # 30 second TTL
    }

Gotcha 3: Hard-Coded References

Problem: Application has hard-coded database hostname (not Service name).

Solution:

# Identify hard-coded references BEFORE migration
grep -r "postgres-old" ./config/
grep -r "10.0.1.5" ./config/  # IP address hard-coded

# Update all references to Service name
# Service name stays constant, backend changes

Gotcha 4: Partial Writes During Swap

Problem: Write started on old component, completed on new component. Data inconsistency.

Solution:

# Use transactional writes
# Ensure writes are atomic
# Database handles in-flight transactions during failover
postgresql:
  synchronous_commit: on
  max_wal_senders: 10

Migration Timeline Example

Real-world PostgreSQL HA migration:

Week Activity Downtime
Week 1 Deploy new cluster, set up replication 0 min
Week 2 Monitor replication lag, validate data consistency 0 min
Week 3 Test read queries against new cluster 0 min
Week 4 Swap Service to new cluster (Friday 2 AM) 0 min
Week 5 Monitor production traffic on new cluster 0 min
Week 6 Remove old cluster 0 min

Total downtime: 0 minutes Rollback windows: Week 4-5 (instant rollback via Service selector) Production incident: 0



The new PostgreSQL cluster ran for 3 weeks in parallel. Replication lag stayed under 100ms. The Service selector changed at 2 AM on a Friday. Applications reconnected within 30 seconds. Error rates stayed flat. After 2 weeks of monitoring, the old cluster was decommissioned. Total downtime: zero minutes.

Comments