An Implementation of Change Data Capture

Introduction

Most enterprise applications generally have relational databases as their persistent data store. Operations on these applications lead to the creation, updating, and deletion of data from tables within relational databases. Downstream applications look at change logs to find out the changes happening in the source systems. The incremental data is essential in processing very specific changes to business data, thereby avoiding the full processing of entire data within operational databases.

A new era of applications provides subscription-based services for publishing the changes of the relevant business data. Initial data loads must be performed prior to sending the captured changes from source systems. However, on some legacy application databases, replication tools cannot be used without needing substantial changes to legacy applications. Downstream systems are then forced to load entire content from the operational database to process and generate incremental meaningful information as part of overnight batch jobs. This approach increases overall batch execution time and increases strain on infrastructure due to additional load processed on a period (daily/weekly/monthly) basis.

MySQL to DynamoDB: Build a Streaming Data Pipeline on AWS Using Kafka

This is the second part of the blog series which provides a step-by-step walkthrough of data pipelines with Kafka and Kafka Connect. I will be using AWS for demonstration purposes, but the concepts apply to any equivalent options (e.g. running these locally in Docker).

This part will show Change Data Capture in action that lets you track row-level changes in database tables in response to create, update and delete operations. For example, in MySQL, these change data events are exposed via the MySQL binary log (binlog).

Change Data Capture to Accelerate Real-Time Analytics

There is nothing new in saying that startups leverage Big Data and AI to develop more innovative business models. As a result, Big Data and AI matters have been ubiquitous in executive and technical forums. But they have often been discussed at such a high level that folks end up missing details on such companies' building blocks.

In this blog post, I’ll cover one of the modern companies’ most valuable building blocks: the ability to process data in real-time, which enables data-driven decision-making in industries like retail, media & entertainment, and finance. For example:

How to Set Up and Run PostgreSQL Change Data Capture

The architecture of modern web applications consists of several software components such as dashboards, analytics, databases, data lakes, caches, search, etc.

The database is usually the core part of any application. Real-time data updates keep disparate data systems in continuous sync and respond quickly to new information. So how to keep your application ecosystem in sync? How do these other components get information about changes in the database? Change Data Capture or CDC refers to any solution that identifies new or changed data.