An Implementation of Change Data Capture

Introduction

Most enterprise applications generally have relational databases as their persistent data store. Operations on these applications lead to the creation, updating, and deletion of data from tables within relational databases. Downstream applications look at change logs to find out the changes happening in the source systems. The incremental data is essential in processing very specific changes to business data, thereby avoiding the full processing of entire data within operational databases.

A new era of applications provides subscription-based services for publishing the changes of the relevant business data. Initial data loads must be performed prior to sending the captured changes from source systems. However, on some legacy application databases, replication tools cannot be used without needing substantial changes to legacy applications. Downstream systems are then forced to load entire content from the operational database to process and generate incremental meaningful information as part of overnight batch jobs. This approach increases overall batch execution time and increases strain on infrastructure due to additional load processed on a period (daily/weekly/monthly) basis.