Leveraging Change Data Capture for Fraud Detection using Arcion Cloud

During the height of the business intelligence (BI) craze earlier in my career, I worked with an internal reporting team to expose data for extract, transform, and load (ETL) processes that leveraged data structures inspired by Ralph Kimball. It was a new and exciting time in my life to understand how to optimize data for reporting and analysis. Honestly, the schema looked upside down to me, based on my experience with transaction-driven designs.

In the end, there were many moving parts and even some dependencies for the existence of a flat file to make sure everything worked properly. The reports ran quickly, but one key factor always bothered me: I was always looking at yesterday’s data.

How to Set Up and Run PostgreSQL Change Data Capture

The architecture of modern web applications consists of several software components such as dashboards, analytics, databases, data lakes, caches, search, etc.

The database is usually the core part of any application. Real-time data updates keep disparate data systems in continuous sync and respond quickly to new information. So how to keep your application ecosystem in sync? How do these other components get information about changes in the database? Change Data Capture or CDC refers to any solution that identifies new or changed data.

Clean Up Your Outbox Tables With Programmatic TTL

For those familiar with the Outbox Pattern, CockroachDB provides some unique capabilities for handling these types of architectural patterns. One common method is to use Changefeeds in CockroachDB to send an acknowledgment message back to the originating service that the database transaction was committed. Changefeeds are great in this scenario in that they can be emitted on a record mutation on the table (except Import), connect to a message bus like Kafka, and emit the payload in a mildly low latent (~100ms) fashion. However, one circumstance of this pattern is having historical records build up in the Outbox table. Fortunately, we have a rather nifty solution that can clean up these Outbox tables.

So the goal in this post is to show how you can programmatically remove records from an Outbox table that have been flushed to its sink (i.e Kafka). The idea here is to create a clean-up job that removes records where the MVCC timestamp of an Outbox record is adequately past the high watermark of a Changefeed.

Change Data Capturing With WSO2 Streaming Integrator

Streaming integration is becoming one of the core components under the enterprise integration stack. Unlike traditional batch integration, streaming integration allows performing ETL operations upon data in real-time and provides results. This empowers businesses by allowing them to act upon fresh data and draw decisions as soon as the data is produced.

But from where does this data produce? Most of the time, this streaming data is being produced from various data sources such as applications, sensors, etc. And in some cases, well-known data sources such as RDBMS can participate in producing such streaming data. This is where the CDC, a.k.a change data capture comes into the picture.

Change Data Capture (CDC) With Embedded Debezium and SpringBoot

While working with data or replicating data sources, you probably have heard the term Change Data Capture (CDC). As the name suggests, “CDC” is a design pattern that continuously identifies and captures incremental changes to data. This pattern is used for real-time data replication across live databases to analytical data sources or read replicas. It can also be used to trigger events based on data changes, such as the OutBox pattern.

Most modern databases support CDC through transaction logs. A transaction log is a sequential record of all changes made to the database while the actual data is contained in a separate file.

Database Synchronisation Is an Integration Pattern!

Integration pattern

I recently took a look into the DZone Integration Zone, and it was globally all about APIs. How to design APIs, test them, use the latest framework, and so on. But it would be a bad idea to forget that there are many things to integrate!

You might also like: Introduction to Integration Patterns

What Is Change Data Capture?

Change data capture, or CDC, is a kind of technology that synchronizes two data sources together. The synchronization can be bi-directional, and everything's made to make it very simple. Just choose the tables you need to synchronize, and if needed, it will also create new tables in the target database. Of course, you can define other existing tables in a simple manner, it’s up to you!