How Disaster Ready Are Your Backup Systems, Really?

In SRE, we believe that some failure is inevitable. Complex systems receiving updates will eventually experience incidents that you can’t anticipate. What you can do is be ready to mitigate the damage of these incidents as much as possible.

One facet of disaster readiness is an incident response - setting up procedures to solve the incident and restore service as quickly as possible. Another strategy involves reducing the chances for failure with tactics like reducing single points of failure. Today, we’ll talk about the third type of readiness: having backup systems and redundancies to quickly restore function when things go very wrong.