Building a Vector Database for Scalable Similarity Search


According to statistics, about 80%-90% of the world's data is unstructured. Fueled by the rapid growth of the Internet, an explosion of unstructured data is expected in the coming years. Consequently, companies are in urgent need of a powerful database that can help them better handle and understand such kinds of data. However, developing a database is always easier said than done. This article aims to share the thinking process and design principles of building Milvus, an open-source, cloud-native vector database for scalable similarity search. This article also explains the Milvus architecture in detail.

Unstructured Data Requires a Complete Basic Software Stack

As the Internet grew and evolved, unstructured data became more and more common, including emails, papers, IoT sensor data, Facebook photos, protein structures, and much more. For computers to understand and process unstructured data, these are converted into vectors using embedding techniques.