An Introduction to Data Virtualization and Its Use Cases

Data virtualization is a solution to address several issues. This type of solution is booming, with strong year-over-year growth. But let's start with a definition first.

Kezako?

Data virtualization is the process of inserting a layer of data access between data sources and data consumers to facilitate access. In practice, we have a kind of SQL requestor as a tool, which is able to query very heterogeneous data sources, ranging from the traditional SQL databases to a text or PDF files, or a streaming source like Kafka. In short, you have data, you can query it, and generate joins between this data. In practice, you can thus offer a unified and complete view of the data, even if it is "exploded" between several systems. On top of that, you have cache and a query optimizer that allows you to minimize the impact on source systems in terms of performance. And, of course, you have a data catalog, which helps you to find your way through all the data in your IT infrastructure. From this we can deduce two main use cases.