downtime prevention | The Blog Pros

October 9, 2020

3 Lessons DevOps Can Learn From 5 Biggest Outages of Q2 2020

‘Learn from the mistakes of others. You can't live long enough to make them all yourself’ – Eleanor Roosevelt.

Nobody is immune from outages but it’s better to learn from other’s mistakes than from your own. The second quarter of 2020 was marked by several serious outages at prominent services including IBM Cloud, GitHub, Slack, Zoom and even T-Mobile (Source: StatusGator Report). I’m sure you noticed these outages like our team did. I decided to share the lessons we learned from this downtime, hoping we can all grow from it.

Best Practices for QA Testing in the DevOps Age

Going live with bugs in the code is a risky roll of the dice as it could lead to unplanned outages, and software downtime leads to loss of revenue and of reputation. Analysts at Gartner Research have estimated that downtime can cost companies as much as $140,000 to $540,000 per hour. Google, for example, saw global outages of its Gmail and Drive products in March, affecting customers throughout Australia, U.S., Europe, and Asia. Facebook and Instagram also suffered worldwide outages in March, leaving users unable to access popular apps for several hours. Customers expect on-demand access and service; outages weigh heavily on a brand’s reputation as well as its finances.

Unfortunately, with migration from legacy systems to microenvironments in the cloud, outages and downtime pose a growing and serious problem. Gone is the time when teams could beta test with customers over time to flag real-time bugs. With current quality testing tools, developers often don’t know how a new software version will perform in production or if it even will work in production. The Cloudbleed bug is an example of this problem. In February 2017, a simple coding error in a software upgrade from security vendor Cloudflare led to a serious vulnerability discovered by a Google researcher several months later. Although Cloudflare still worked, the bug meant that it was leaking sensitive data.

PostgreSQL BiDirectional Replication
No categories
As you can understand from my previous blogs I am really into PostgreSQL. Previously we ran Debezium in Embedded mode. Behind the scenes, Debezium consumes the changes that were committed to the transaction log. This happens by utilizing the logical de... […]
Twenty Things Every Java Software Architect Should Know
No categories
As the software development landscape continues to evolve at a rapid pace, Java stands out as a foundational language that drives a multitude of applications on a global scale. In 2024, the role of a Java software architect has assumed unprecedented si... […]
How To Plan a (Successful) MuleSoft VPN Migration (Part II)
No categories
In this second post, we'll be reviewing more topics that you should take into consideration if you're planning a VPN migration. If you missed the first part, you can start from there. […]
Leveraging Microsoft Graph API for Unified Data Access and Insights
No categories
In today's world driven by data, it is essential for businesses and developers to efficiently access and manage data. The Microsoft Graph API serves as a gateway to connect with Microsoft services, like Office 365 Azure AD, OneDrive, Teams, and more. B... […]
AWS CDK: Infrastructure as Abstract Data Types
No categories
Infrastructure as Code (IaC), as the name implies, is a practice that consists of defining infrastructure elements with code. This is opposed to doing it through a GUI (Graphical User Interface) like, for example, the AWS Console. The idea is that in o... […]

Proudly powered by WordPress