How Is Data Processed in a Vector Database?

In the previous two posts in this blog series, we have already covered the system architecture of Milvus, the world's most advanced vector database, and its Python SDK and API.

This post mainly aims to help you understand how data is processed in Milvus by going deep into the Milvus system and examining the interaction between the data processing components.

Optimized File Formats – Reduce Overall System Latency

Since Optimized columnar file formats helped Big data ecosystem to have SQL query features, Organizations are now able to retrain their existing data warehouse or Database developers quickly in Big data technology and migrate their analytics applications to on-premise Hadoop clusters or cheap object storage in the cloud.

When Columnar file formats were first proposed in the early 2010s, the intention was to enable faster query execution engines on top of the Hadoop file system. The columnar format was explicitly designed to give much-improved query performance than conventional row-based file formats. Columnar file formats give much better performance than row-based file formats (used in conventional Databases and data warehouses) when a partial set of columns from a table are queried.

Apache NiFi Overview

What Is Apache NiFI?

Apache NiFi is a robust open-source Data Ingestion and Distribution framework and more. It can propagate any data content from any source to any destination.

NiFi is based on a different programming paradigm called Flow-Based Programming (FBP). I’m not going to explain the definition of Flow-Based Programming. Instead, I will tell how NiFi works, and then you can connect it with the definition of Flow-Based Programming.