disaster recovery | The Blog Pros

February 13, 2022

Architecting Zero-Touch Disaster Recovery With Kong Gateway, Kong 9Mesh, and AAP

This post and video were co-created by David La Motta (Kong), Ross McDonald (Kong), and Alex Dworjan (Red Hat).

Murphy’s Law

“Anything that can go wrong will go wrong.”

January 14, 2022

Chaos Engineering – The Practice Behind Controlling Chaos

Chaos Engineering might sound like a buzzword - but take it from someone who used to joke his job title was Chief Chaos Engineer (more on that later) it is much more than buzz or a passing fad - it’s a practice.

The world can be a scary place and more and more companies are beginning to turn to Chaos Engineering to proactively poke and prod their systems and in doing so are improving their reliability and guarding against unexpected failures in production and unplanned downtime.

November 20, 2021

Git Clone Command vs. GitHub Backup – Best Practices

Cloning is a popular theme in science fiction movies and literature. Just to mention Star Wars and Attack of the Clones. But it’s not science fiction at all – in the real world probably everyone has heard of Dolly the sheep, the first cloned mammal. Since then, mankind has managed to clone, among others horse, pig, or dog. Wait, we are interested in the IT world, right? The world of over 87.2% programmers using the Git version control system, 60M GitHub users, 10M Bitbucket teams, and over 30M GitLab enthusiasts – so let’s focus on a very in-depth look at the git clone command topic. Do we have your attention?

What Is a Git Clone?

To work with Git, we need to have a copy of the repo on our device. In an event of failure and the lack of backups, we can restore the entire repository on the basis of such a copy. So what is a clone? This is a complete copy of the repository with an entire history of changes. But a clone is also the name of a specific function in Git that allows us to do this.

July 2, 2021

Experiencing the Aftermath Will Make You Tougher, Wiser, and Ready For Anything

I was in charge of managing a dedicated server running Debian 7. The server hosts multiple websites with email services, the server also hosts multiple instances of a critical web application for a client who is running a business across different regions.

That day was a very important day, as the client was expecting his own client to turn up. The client wanted to make a demo of the application and show how they manage some business processes.

During that event, I got a phone call claiming client users not able to access the web application. I took the request as usual and started checking the filed issue, and a few seconds later, I got another call about other users not able to access their mailbox. It is then I realized that a very nasty thing is happening and I'm in serious trouble.

I quickly figured out that I had made the worst mistake ever!

That day I was performing usual maintenance tasks on the server, freeing some disk space here and there. However, at some moment, I deleted critical files that belong to different services like Postgres, Mysql, mail server, etc. I didn't notice anything until I started receiving reported issues from the clients.

It was catastrophic in all corners.

We lost three months of data as backups resided in a single place where I launched the deletion operation, there were no other copies of backups. Many services were surviving with what was left in RAM (I guess) and any respawned process was lethal for the corresponding service.

In the field, the client was badly embarrassed in front of his client, as he was cut off at the beginning of the demo.

Users started processing customer data manually using a pen and paper. The recovering of the server was a "mission: impossible". We needed to reinstall everything from the scratch, but it was decided that the fastest recovery would be to migrate to another server with the latest Debian version.

So, I installed all the required services on the new server and restored the most recent backup. The business application was finally live.

One must learn a lot of lessons because of this disaster.

Lessons learned:

May 2, 2021

Resolving the MySQL Active-Active Replication Dilemma

Multi-writer replication has been a challenge in the MySQL ecosystem for years before truly dedicated solutions were introduced – first Galera (and so Percona XtradDB Cluster (PXC)) replication (around 2011), and then Group Replication (first GA in 2016).

Now, with both multi-writer technologies available, do we still need traditional asynchronous replication, set up in active-active topology? Apparently yes, there are still valid use cases. And you may need it not only when for some reason Galera/PXC or GR are not suitable, but also when you actually use them. Of course, the most typical case is to have a second cluster in a different geographic location, as Disaster Recovery. If you still wonder why you would need it, just read how a whole data center can disappear in the news a few weeks ago, about the OVH incident.

April 5, 2021

Running QuestDB on GKE Autopilot

Recently, I’ve been experimenting with QuestDB as the primary time-series database to stream and analyze IoT/financial data:

While I was able to validate the power of QuestDB in storing massive amounts of data and querying them quickly in those two projects, I was mostly running them on my laptop via Docker. In order to scale my experiments, I wanted to create a more production-ready setup, including monitoring and disaster recovery on Kubernetes. So in this guide, we’ll walk through setting up QuestDB on GKE with Prometheus and Velero.

March 2, 2020

Ensuring SQL Server High Availability in the Cloud

Theoretically, the cloud seems tailor-made for ensuring high availability (HA) and disaster recovery (DR) solutions in mission critical SQL Server deployments. Azure, AWS, and Google have distributed, state-of-the-art data centers throughout the world. They offer a variety of SLAs that can guarantee virtual machine (VM) availability levels of 99.95% and higher.

But deploying SQL Server for HA or DR has always posed a challenge that goes beyond geographic dispersion of data centers and deep levels of hardware redundancy. Configuring your SQL Server for HA or DR involves building a Windows Server Failover Cluster (WSFC) that ensures not only the availability of different machines running SQL Server itself but also — and most importantly — the availability of storage holding the data in which SQL Server is interacting.

February 27, 2020

Are You Taking the Right Approach to Cloud Databases?

Trends in cloud data storage continue to accelerate at a rapid pace. Now more than ever, organizations must evaluate their current and future data storage needs to find solutions that align with business goals. While cloud databases are relatively new to the scene, they show tremendous prospect in securing and managing data.

In selecting our topic for this Trend Report, we found the amount of promise and advancement in the space to be unparalleled. This Report highlights DZone’s original research on cloud databases and contributions from the community, as well as introduces new offerings within DZone Trend Reports.

February 12, 2020

Creating Backups on SQL Server for Disaster Recovery

Backups are one key to a successful disaster recovery plan. Every database engine has its own backup commands and procedures, and Microsoft SQL Server is no exception. SQL Server has capabilities for full and differential backups as well as a backup process for transaction logs. These procedures can be used in combination to ensure limited downtime should your database suffer from an outage or critical, unrecoverable crash.

Full Backups vs. Differential Backups

Before creating a backup, it's important to know the different types. There are three types: full, differential, and incremental. SQL Server supports full and differential backups, but some administrators incorrectly call differential backups "incremental." There is a distinct difference between the two, however, and it affects the way databases see backup data.

May 3, 2019

PyMongo Tutorial: Testing MongoDB Failover in Your Python App

Python is a powerful and flexible programming language used by millions of developers around the world to build their applications. It comes as no surprise that Python developers commonly leverage MongoDB hosting, the most popular NoSQL database, for their deployments due to its flexible nature and lack of schema requirements.

So, what’s the best way to use MongoDB with Python? PyMongo is a Python distribution containing tools for working with MongoDB, and the recommended Python MongoDB driver. It is a fairly mature driver that supports most of the common operations with the database, and you can check out this tutorial for an introduction to the PyMongo driver.