Operational Workflows¶

Policy update workflows, backup procedures, performance tuning, and operational best practices.

Policy Updates Workflow¶

Rolling Out Policy Changes¶

Safe rollout process:

graph TD
    DEV[1. Test in Dev] --> QAC[2. Deploy to QAC]
    QAC --> AUDIT[3. Monitor in Audit Mode]
    AUDIT --> VALIDATE[4. Review Violations]
    VALIDATE --> FIX[5. Fix Issues]
    FIX --> STG[6. Deploy to Staging]
    STG --> ENFORCE[7. Switch to Enforce]
    ENFORCE --> PRD[8. Deploy to Production]

    %% Ghostty Hardcore Theme
    style DEV fill:#a7e22e,color:#1b1d1e
    style PRD fill:#65d9ef,color:#1b1d1e

Never Skip Audit Phase

Deploying policies directly to Enforce mode in production causes outages. Always start with Audit, monitor for 1+ week, fix violations, then switch to Enforce.

Timeline:

Dev: Immediate
QAC: 1 week in Audit mode
Staging: 1 week in Enforce mode
Production: After 0 violations in Staging

Backup and Recovery¶

Policy Backup¶

# Backup all policies
kubectl get clusterpolicy -o yaml > backup-policies.yaml
kubectl get policyexception -A -o yaml >> backup-policies.yaml

# Backup Kyverno configuration
helm get values kyverno -n kyverno > backup-kyverno-values.yaml
helm get values policy-reporter -n policy-reporter > backup-policy-reporter-values.yaml

Disaster Recovery¶

Restore from backup:

# Reinstall Kyverno
helm install kyverno kyverno/kyverno \
  --namespace kyverno \
  --values backup-kyverno-values.yaml

# Restore policies
kubectl apply -f backup-policies.yaml

Automate Backups

Set up a cronjob to backup policies nightly. Store backups in Git or S3 for disaster recovery.

Performance Tuning¶

Admission Controller Scaling¶

# kyverno-values.yaml
admissionController:
  replicas: 3  # High availability

  resources:
    limits:
      memory: 512Mi
      cpu: 500m
    requests:
      memory: 256Mi
      cpu: 200m

Background Scan Interval¶

features:
  backgroundScan:
    backgroundScanInterval: 12h  # Reduce frequency for large clusters

Resource Filters¶

Exclude unnecessary resources:

resourceFilters:
  resourceFiltersExcludeNamespaces:
    - kube-system
    - kube-public
    - gmp-system
    - cnrm-system

  resourceFiltersExcludeResources:
    - "[Event,*,*]"
    - "[*,kube-system,*]"
    - "[Node,*,*]"

Best Practices¶

1. Version Everything¶

Policy repos: Semantic versions
policy-platform container: Tagged versions
Helm deployments: Track release history

2. Monitor Continuously¶

PolicyReporter dashboard: Real-time violation tracking
Prometheus alerts: Automated alert notifications
Slack notifications: Team communication channels

3. Test Before Enforce¶

New policies start in Audit mode: Never deploy directly to Enforce
Monitor for 1+ week: Collect violation data before enforcement
Fix violations before Enforce: Remediate issues in Audit phase

4. Document Exceptions¶

Require JIRA tickets: Track all policy exceptions
Set expiration dates: Enforce time-bound exceptions
Quarterly reviews: Audit exception validity regularly

5. Automate Evidence Collection¶

Monthly policy exports: Snapshot policy state regularly
Automated compliance reports: Generate reports automatically
Audit trail preservation: Retain evidence for audits

Key Takeaways¶

✅ Local dev: Run policies in containers before commit

✅ CI integration: Automated validation in pipelines

✅ Runtime enforcement: Kyverno admission control

✅ Multi-source: Aggregate policies from multiple repos

✅ Operations: Monitor, update, and audit

Related Resources:

SDLC Hardening: Broader enforcement patterns
Kyverno Guide: Deep dive on Kyverno
Pre-commit Hooks: Complementary local checks