Optimising for robustness will inevitably lead to an overinvestment in pre-production risk management, and an underinvestment in production risk management. Symptoms of underinvestment include:
- Stagnant requirements – “non-functional” requirements are deprioritised for weeks or months at a time
- Snowflake infrastructure – environments are manually created and maintained in an unreproducible state
- Inadequate telemetry – logs and metrics are scarce, anomaly detection and alerting are manual, and user analytics lack insights
- Fragile architecture – services are coupled, service instances are stateful, failures are uncontained, and load vulnerabilities exist
- Insufficient training – operators are not given the necessary coaching, education, or guidance