replication | The Blog Pros

March 21, 2022

Raft in Tarantool: How It Works and How to Use It

Last year, we introduced synchronous replication in Tarantool. We followed the Raft algorithm in the process. The task consisted of two major phases: so-called quorum writing (i.e., synchronous replication) and automated leader election.

Synchronous replication was first introduced in release 2.5.1, while release 2.6.1 brought the support of Raft-based automated leader election.

January 24, 2022

Synchronous Replication in Tarantool (Part 3)

Read part 1 of this article here: Synchronous Replication in Tarantool (Part 1).

Read part 2 of this article here: Synchronous Replication in Tarantool (Part 2).

July 18, 2021

Utilizing BigQuery as A Data Warehouse in A Distributed Application

Introduction

Data plays an integral part in any organization. With the data-driven nature of modern organizations, almost all businesses and their technological decisions are based on the available data. Let's assume that we have an application distributed across multiple servers in different regions of a cloud service provider, and we need to store that application data in a centralized location. The ideal solution for that would be to use some type of database. However, traditional databases are ill-suited to handle extremely large datasets and lack the features that would help data analysis. In that kind of situation, we will need a proper data warehousing solution like Google BigQuery.

What is Google BigQuery?

BigQuery is an enterprise-grade, fully managed data warehousing solution that is a part of the Google Cloud Platform. It is designed to store and query massive data sets while enabling users to manage data via the BigQuery data manipulation language (DML) based on the standard SQL dialect.

July 15, 2021

Data Replication for DBMS Using the Commit Log

Introduction

In this article, we will see how developers can break down information silos for their teams and business by replicating data across multiple systems. First, we will review why developers will replicate data and considerations for the cloud. Second, we will prepare for war with the replicators. Then we will examine the architecture of Postgres and MySQL and how their commit logs enable us to make exact copies of the data. Finally, we will connect Debezium to Postgres for a complete data replication solution.

Introduction to Data Replication

Data replication is the process of moving data between different database systems for various business use cases. In a typical SaaS (Software As A Service) application, data is stored in an operational database such as MySQL, PostgreSQL, Oracle, etc. There are other database systems such as data warehouses and search systems built for specialized use cases. Moving data between these systems is known as data replication.

May 2, 2021

Resolving the MySQL Active-Active Replication Dilemma

Multi-writer replication has been a challenge in the MySQL ecosystem for years before truly dedicated solutions were introduced – first Galera (and so Percona XtradDB Cluster (PXC)) replication (around 2011), and then Group Replication (first GA in 2016).

Now, with both multi-writer technologies available, do we still need traditional asynchronous replication, set up in active-active topology? Apparently yes, there are still valid use cases. And you may need it not only when for some reason Galera/PXC or GR are not suitable, but also when you actually use them. Of course, the most typical case is to have a second cluster in a different geographic location, as Disaster Recovery. If you still wonder why you would need it, just read how a whole data center can disappear in the news a few weeks ago, about the OVH incident.

July 10, 2020

How To Achieve Mongo Replication on Docker

In the previous post, I showed how we used MongoDB replication to solve several problems we were facing.

Replication got to be a part of a bigger migration which brought stability, fault-tolerance, and performance to our systems. In this post, we will dive into the practical preparation of that migration.

June 1, 2020

Mirror Maker v2.0

Before we start let's make some abbreviations.

Mirror maker v1.0 -> mmv1
Mirror maker v2.0 -> mmv2

Find the Project

Find all the stuff on this document over here. Here is what the repository contains.

March 26, 2020

Top 5 HCI Myths Busted

Despite being utilized by thousands of IT professionals, several persistent myths surround hyperconverged infrastructure (HCI) that cause confusion and misconceptions among those who have an HCI solution deployed. These are five of the most prevalent myths debunked.

Myth #1 — HCI Is Too Expensive

The acquisition price of an HCI solution varies by vendor and often by the brand of hypervisor used in the solution, and while it can often be the case that purchasing the individual components needed to create a virtualization infrastructure, it may be less expensive than purchasing an HCI solution. That is only part of the cost of the solution. The true and total cost of infrastructure goes far beyond the initial purchase.

November 15, 2019

Making It Easier to Manage a Production PostgreSQL Database

Manage a Production PostgreSQL Database

The past several years have seen increasing adoption for PostgreSQL. PostgreSQL is an amazing relational database. Feature-wise, it is up there with the best, if not the best. There are many things I love about it — PL/ PG SQL, smart defaults, replication (that actually works out of the box), and an active and vibrant open source community. However, beyond just the features, there are other important aspects of a database that need to be considered.

If you are planning to build a large 24/7 operation, the ability to easily operate the database once it is in production becomes a very important factor. In this aspect, PostgreSQL does not hold up very well. In this blog post, we will detail some of these operational challenges with PostgreSQL. There is nothing fundamentally unfixable here, just a question of prioritization. Hopefully, we can generate enough interest in the community to prioritize these features.

May 28, 2019

Why Data Replication Should Not Be Done Using ESB-Based Integration Tools

This is one of the common questions we get when prospects come looking for data replication tools. It's more a question of Integration design patterns than of product implements.

Let's get started with what an ESB is - Enterprise Service Bus. This is an integration design pattern where messages are passed so that one or more Message Listeners can listen and consume the message - store and forward. These messages—like, say, emails—have a header (from and to), a payload (the message), and perhaps attachments. Based on the ESB, there might be some limitation on payload and attachments sizes.