Migration Guide¶
Strangler Fig migrations follow eight phases over 8 weeks. Each phase has validation criteria. Don't proceed to next phase until current phase is stable.
Migration Pattern
This guide covers the Strangler Fig pattern for incremental system migration. Review all sections for complete implementation strategy.
Migration Checklist¶
| Phase | Actions | Validation |
|---|---|---|
| 1. Shadow | New system runs in background | Mismatch logging, zero production impact |
| 2. Dual Write | Write to both databases | Data consistency checks |
| 3. 1% Traffic | Route 1% to new system | Error rates, latency P95 |
| 4. 5% Traffic | Increase to 5% | Monitor for 24 hours |
| 5. 10% Traffic | Increase to 10% | Load testing, resource usage |
| 6. 50% Traffic | Half traffic on new | Full production validation |
| 7. 100% Traffic | All traffic on new | Legacy in standby for 1 week |
| 8. Decommission | Remove legacy | Data migration complete |
Don't skip phases. Each validates the next.
Real-World Timeline¶
Weeks 1-2: Shadow Mode¶
- New system processes requests in background
- Logs mismatches with legacy
- Fix discrepancies
- Goal: 99.9% match rate
No production traffic to new system yet. Shadow mode finds bugs without risk.
Week 3: 1% Traffic¶
- Route 1% production traffic
- Monitor error rates, latency
- Fix bugs discovered under real load
Real users hit new system. First chance to validate under production conditions.
Week 4: 10% Traffic¶
- Increase to 10%
- Load test at scale
- Optimize resource usage
Enough traffic to stress test. Too little to cause catastrophic failure.
Weeks 5-6: 50% Traffic¶
- Half production load
- Performance tuning
- Cost analysis
This is the confidence phase. If 50% works for 2 weeks, 100% will work.
Week 7: 100% Traffic¶
- All traffic on new system
- Legacy in standby
- Monitor for regressions
Don't delete legacy yet. Keep it running for instant rollback.
Week 8: Decommission¶
- Remove legacy system
- Delete legacy database
- Update documentation
Migration complete. Legacy system retired.
Common Pitfalls¶
Pitfall 1: No Rollback Plan¶
Always have a kill switch. Feature flag or traffic router that can shift back instantly.
Fix: Deploy feature flag infrastructure before migration starts. Test rollback in staging.
Pitfall 2: Skipping Shadow Mode¶
Don't route production traffic to unvalidated code. Shadow first.
Fix: Run new system in shadow mode for minimum 1 week. Fix all mismatches before routing real traffic.
Pitfall 3: Ignoring Data Consistency¶
Dual writes can fail. Monitor lag between legacy and new databases.
Fix: Implement reconciliation jobs that detect and fix data drift. Alert on lag >100ms.
Pitfall 4: Moving Too Fast¶
Don't jump from 1% to 100%. Increase gradually. Monitor each phase.
Fix: Spend at least 24 hours at each phase. If metrics degrade, don't proceed.
Pre-Migration Checklist¶
Before starting:
- [ ] Feature flag system deployed and tested
- [ ] Shadow mode infrastructure in place
- [ ] Monitoring dashboards created (legacy vs new)
- [ ] Alerting rules configured
- [ ] Rollback runbook documented
- [ ] On-call rotation briefed
- [ ] Database backup strategy verified
- [ ] Load testing plan documented
- [ ] Communication plan for stakeholders
Don't start without this foundation.
Success Criteria¶
Migration is complete when:
- New system handles 100% traffic for 1 week
- Error rates ≤ legacy baseline
- Latency P95 ≤ legacy baseline
- Resource usage within budget
- Legacy system removed
- Documentation updated
- Post-mortem completed
Post-Migration Cleanup¶
After decommissioning legacy:
- Remove feature flags (dead code)
- Delete legacy database
- Update runbooks and documentation
- Archive metrics dashboards (historical reference)
- Remove dual-write code paths
- Simplify routing logic
Don't leave migration scaffolding in production forever.
Related Patterns¶
- Implementation Strategies - Feature flags and shadow mode
- Traffic Routing - Percentage and user-based routing
- Monitoring and Rollback - Observability and safety
Week 1: Shadow mode caught edge cases. Week 3: 1% traffic revealed load balancer bug. Week 5: 50% traffic ran stable. Week 8: Legacy decommissioned. Migration complete. Zero production incidents. Post-mortem showed careful phasing prevented catastrophic failure.