Moving Data From Cassandra (OLTP) to Data Warehousing

Overview

Data should be streamed to analytics engines in real-time or near real-time in order to incrementally upload transnational data to a data warehousing system. In my case, my OLTP is Cassandra and OLAP is Snowflake. The OLAP system requires data from Cassandra on a periodic basis. Requirements pertaining to this scenario are:

  1. The frequency of the data copy needs to be reduced drastically. 
  2. Data has to be consistent. Cassandra and Snowflake should be in sync.
  3. In a few cases, all mutations have to be captured
  4. Currently, production cluster data size is in petabytes; hourly at least 100 gigabytes of data is generated.

With such a granularity requirement, one should not copy the data from an OLTP system to OLAP, as it would be invasive to read the path, and writing the path of Cassandra would result in an impinge on TPS. Thus, we are required to provide a different solution for copying the Cassandra data to Snowflake.