SAP S/4HANA, Microsoft SQL Integration and Hard Deletion Handling

This article will demonstrate the heterogeneous systems integration and building of the BI system and mainly talk about the DELTA load issues and how to overcome them. How can we compare the source table and target table when we cannot find a proper way to identify the changes in the source table using the SSIS ETL Tool?

Systems Used

  • SAP S/4HANA is an Enterprise Resource Planning (ERP) software package meant to cover all day-to-day processes of an enterprise, e.g., order-to-cash, procure-to-pay, finance & controlling request-to-service, and core capabilities. SAP HANA is a column-oriented, in-memory relational database that combines OLAP and OLTP operations into a single system.
  • SAP Landscape Transformation (SLT) Replication is a trigger-based data replication method in the HANA system. It is a perfect solution for replicating real-time data or schedule-based replication from SAP and non-SAP sources.
  • Azure SQL Database is a fully managed platform as a service (PaaS) database engine that handles most of the management functions offered by the database, including backups, patching, upgrading, and monitoring, with minimal user involvement.
  • SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and transformation solutions. SSIS is used to integrate and establish the pipeline for ETL and solve complex business problems by copying or downloading files, loading data warehouses, cleansing, and mining data.
  • Power BI is an interactive data visualization software developed by Microsoft with a primary focus on business intelligence.

Business Requirement

Let us first talk about the business requirements. We have more than 20 different Point-of-Sale (POS) data from other online retailers like Target, Walmart, Amazon, Macy's, Kohl's, JC Penney, etc. Apart from this, the primary business transactions will happen in SAP S/4HANA, and business users will require the BI reports for analysis purposes.

HTAP: One Size Fits All?

An important idea in the database world is that specialized databases will outperform general-purpose databases. Michael Stonebraker, an A. M. Turing Award Laureate and one of the most influential people in the database world, also discussed this in his paper, One Size Fits All: An Idea Whose Time Has Come and Gone.

This is a rational judgment because it's tough enough to build a database that supports either Online Transactional Processing (OLTP) or Online Analytical Processing (OLAP) workloads, let alone one that supports both at the same time. But the dilemma is, that today, many users are facing increasing demands with mixed OLTP and OLAP workloads. How do we crack this then?

Comparing Apache Hive and Spark

Introduction

Hive and Spark are two very popular and successful products for processing large-scale data sets. In other words, they do big data analytics. This article focuses on describing the history and various features of both products. A comparison of their capabilities will illustrate the various complex data processing problems these two products can address.

More on the subject:

Moving Data From Cassandra (OLTP) to Data Warehousing

Overview

Data should be streamed to analytics engines in real-time or near real-time in order to incrementally upload transnational data to a data warehousing system. In my case, my OLTP is Cassandra and OLAP is Snowflake. The OLAP system requires data from Cassandra on a periodic basis. Requirements pertaining to this scenario are:

  1. The frequency of the data copy needs to be reduced drastically. 
  2. Data has to be consistent. Cassandra and Snowflake should be in sync.
  3. In a few cases, all mutations have to be captured
  4. Currently, production cluster data size is in petabytes; hourly at least 100 gigabytes of data is generated.

With such a granularity requirement, one should not copy the data from an OLTP system to OLAP, as it would be invasive to read the path, and writing the path of Cassandra would result in an impinge on TPS. Thus, we are required to provide a different solution for copying the Cassandra data to Snowflake.

ClickHouse Monitoring Key Metrics to Monitor

If you keep up to date with the latest developments in the world of databases, you are probably familiar with ClickHouse, an open-source columnar database management system designed for OLAP. Developed by Yandex, ClickHouse was open-sourced in 2016, which makes it one of the most recent database management systems to become widely available as an open-source tool.

Because ClickHouse supports real-time, high-speed reporting, it's a powerful tool, especially for modern DevOps teams who need instantaneous, fast, and flexible ways of analyzing data.