April 26, 2019 by Garrett Alley

What Is a Data Pipeline?

You may have seen the iconic episode of "I Love Lucy" where Lucy and Ethel get jobs wrapping chocolates in a candy factory. The high-speed conveyor belt starts up and the ladies are immediately out of their depth. By the end of the scene, they are stuffing their hats, pockets, and mouths full of chocolates, while an ever-lengthening procession of unwrapped confections continues to escape their station. It's hilarious. It's also the perfect analog for understanding the significance of the modern data pipeline.

The efficient flow of data from one location to the other - from a SaaS application to a data warehouse, for example - is one of the most critical operations in today's data-driven enterprise. After all, useful analysis cannot begin until the data becomes available. Data flow can be precarious, because there are so many things that can go wrong during the transportation from one system to another: data can become corrupted, it can hit bottlenecks (causing latency), or data sources may conflict and/or generate duplicates. As the complexity of the requirements grows and the number of data sources multiplies, these problems increase in scale and impact.

The Scale, Speed, and Spend of Low Code: Benefits and Challenges of Low-Code Platforms
No categories
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Low-Code Development: Elevating the Engineering Experience With Low and No Code. The relevance of low-code development is growing as businesses see... […]
Empowering Citizen Developers With Low- and No-Code Tools: Changing Developer Workflows and Empowering Non-Technical Employees to Build Apps
No categories
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Low-Code Development: Elevating the Engineering Experience With Low and No Code. The rise of low-code and no-code (LCNC) platforms has sparked a de... […]
PostgreSQL BiDirectional Replication
No categories
As you can understand from my previous blogs I am really into PostgreSQL. Previously we ran Debezium in Embedded mode. Behind the scenes, Debezium consumes the changes that were committed to the transaction log. This happens by utilizing the logical de... […]
Twenty Things Every Java Software Architect Should Know
No categories
As the software development landscape continues to evolve at a rapid pace, Java stands out as a foundational language that drives a multitude of applications on a global scale. In 2024, the role of a Java software architect has assumed unprecedented si... […]
How To Plan a (Successful) MuleSoft VPN Migration (Part II)
No categories
In this second post, we'll be reviewing more topics that you should take into consideration if you're planning a VPN migration. If you missed the first part, you can start from there. […]

Proudly powered by WordPress