ELT Is Dead, and EtLT Will Be the End of Modern Data Processing Architecture

When talking about data processing, people often abbreviate it as “ETL.” However, if we look closely, data processing has undergone several iterations from ETL, ELT, XX ETL (such as Reverse ETL, Zero-ETL) to the currently popular EtLT architecture. While the Hadoop era mainly relied on ELT (Extract, Load, Transform) methods, the rise of real-time data warehouses and data lakes has rendered ELT obsolete. EtLT has emerged as the standard architecture for real-time data loading into data lakes and real-time data warehouses.

Let’s explore the reasons behind the emergence of these architectures, their strengths and weaknesses, and why EtLT is gradually replacing ETL and ELT as the global mainstream data processing architecture, along with practical open-source methods.

ETL Era (1990–2015)

In the early days of data warehousing, Bill Inmmon, the proponent of data warehousing, defined it as a data storage architecture for partitioned subjects, where data was categorized and cleaned during storage. During this period, most data sources were structured databases (e.g., MySQL, Oracle, SQLServer, ERP, CRM), and data warehouses predominantly relied on OLTP databases (e.g., DB2, Oracle) for querying and historical storage. Handling complex ETL processes with such databases proved to be challenging. To address this, a plethora of ETL software emerged, such as Informatica, Talend, and Kettle, which greatly facilitated integrating complex data sources and offloading data warehouse workloads.