July 23, 2021 by Subhendu Dey

Next-Gen Data Pipes With Spark, Kafka and k8s

Introduction

Data integration has always played an essential role in the information architecture of any enterprise. Specifically, the analytical processes of the enterprise heavily depend on this integration pattern in order for the data to be made available from transactional systems and loaded in an analytics-friendly format. When the systems were not interconnected so much in the traditional architecture paradigm, and the latency between transactions and analytical insights was permissible and the integrations mainly were batch-oriented.

In the batch pattern, typically, large files (data dump) are generated by the operational systems, and those are processed (validated, cleansed, standardized, and transformed) to create some output files to feed to the analytical systems. Of course, reading such large files was memory intensive; hence, data architects used to rely upon a series of staging databases to store step-by-step data processing output. As the distributed computing evolved to Hadoop, MapReduce addressed the high memory requirement by distributing the processing across horizontally scalable commoditized hardware. As the computing technique has evolved further, it is now possible to run MapReduce in-memory, which today has become a kind of de-facto standard for processing large data files.

The Scale, Speed, and Spend of Low Code: Benefits and Challenges of Low-Code Platforms
No categories
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Low-Code Development: Elevating the Engineering Experience With Low and No Code. The relevance of low-code development is growing as businesses see... […]
Empowering Citizen Developers With Low- and No-Code Tools: Changing Developer Workflows and Empowering Non-Technical Employees to Build Apps
No categories
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Low-Code Development: Elevating the Engineering Experience With Low and No Code. The rise of low-code and no-code (LCNC) platforms has sparked a de... […]
PostgreSQL BiDirectional Replication
No categories
As you can understand from my previous blogs I am really into PostgreSQL. Previously we ran Debezium in Embedded mode. Behind the scenes, Debezium consumes the changes that were committed to the transaction log. This happens by utilizing the logical de... […]
Twenty Things Every Java Software Architect Should Know
No categories
As the software development landscape continues to evolve at a rapid pace, Java stands out as a foundational language that drives a multitude of applications on a global scale. In 2024, the role of a Java software architect has assumed unprecedented si... […]
How To Plan a (Successful) MuleSoft VPN Migration (Part II)
No categories
In this second post, we'll be reviewing more topics that you should take into consideration if you're planning a VPN migration. If you missed the first part, you can start from there. […]

Proudly powered by WordPress