Chaos Experiment Catalog¶
Each experiment follows this structure:
- Configuration - Complete YAML
- Expected behavior - What should happen
- Success criteria - How to know it worked
- Validation queries - Prometheus/observability checks
- Rollback procedure - How to stop it
Start Small, Scale Systematically
Begin with single-pod experiments in staging. Progress to production only after validating success criteria, rollback procedures, and observability coverage.
Experiment Categories¶
Pod-Level Experiments¶
Pod deletion and crash loop testing to validate application recovery and orchestration behavior.
Key scenarios:
- Pod deletion (crash loop testing)
- Graceful shutdown validation
- Readiness probe behavior
- Replica replacement timing
Network Experiments¶
Network latency injection and dependency timeout testing to validate circuit breakers and fallback patterns.
Key scenarios:
- Database latency injection
- Circuit breaker validation
- Cache fallback behavior
- Timeout handling
Resource Experiments¶
Memory pressure and CPU stress testing to validate resource limit enforcement and graceful degradation.
Key scenarios:
- Memory pressure testing
- OOM kill behavior
- Garbage collection under stress
- Load shedding patterns
Dependency Experiments¶
Multi-service resilience testing to validate circuit breakers, fallback patterns, and graceful degradation.
Key scenarios:
- External API failures
- Circuit breaker patterns
- Fallback responses
- Automatic recovery
Quick Start¶
All experiments follow the same execution pattern:
# Apply chaos experiment
kubectl apply -f experiment.yaml
# Monitor system behavior
kubectl logs -f -n chaos-testing deployment/chaos-controller-manager
# Verify success criteria with Prometheus queries
# (See individual experiment pages for specific queries)
# Rollback if needed
kubectl delete -f experiment.yaml
Related Documentation¶
- Back to Overview - Chaos engineering introduction
- Validation Patterns - SLI monitoring and testing
- Operations - Running experiments safely