SRE: A Human Approach to Systems

Why Site Reliability Engineering

In the world of technology, the stakes have never been higher. The move to the cloud and microservices to maximize agility has given way to digital disruptors and unprecedented competitive threats. As distributed systems become increasingly complex, the scale of ‘unknown unknowns’ increases. On top of this, customer expectations are sky-high.

The cost of downtime is catastrophic, with customers willing to churn if their needs are not promptly met. According to Gartner, the average cost of downtime is $300,000 per hour. For some companies, this number is considerably higher; for example, Amazon lost approximately $90 million during their Prime Day outage in 2018, and the outage only lasted 75 minutes.