How Milvus Deletes Streaming Data in a Distributed Cluster

Featuring unified batch-and-stream processing and cloud-native architecture, Milvus 2.0 poses a greater challenge than its predecessor did during the development of the DELETE function. Thanks to its advanced storage-computation disaggregation design and the flexible publication/subscription mechanism, we are proud to announce that we made it happen. In Milvus 2.0, you can delete an entity in a given collection with its primary key so that the deleted entity will no longer be listed in the result of a search or a query.

Please note that the DELETE operation in Milvus refers to logical deletion, whereas physical data cleanup occurs during the Data Compaction. Logical deletion not only greatly boosts the search performance constrained by the I/O speed, but also facilitates data recovery. Logically deleted data can still be retrieved with the help of the Time Travel function.

Why Database Sizing Is So Hard?

Sizing is something that seems deceptively simple: take the size of your dataset and the required throughput and divide by the capacity of a node. Easy, isn’t it?

If you’ve ever tried your hand at capacity planning, you know how hard it can be. Even making a rough estimation can be quite challenging. Why is this so hard?

New Feature of Interference Cluster Release in Version 2021.1

Introduction

The 2021.1 version of the interference cluster has been released. (The previous article, in which I talk about the basic features of this software, can be found here.). Much attention was paid to improving overall performance and stability. In my opinion, an interesting feature has appeared, which I want to talk about in this short article.

Previously, the concept of the interference cluster was kept strictly within the framework of a server-side service that provided persistence and event interaction services to some server-side java applications. Since the concept of the interference as a database did not provide for JDBC connections, we could not access the data from the outside in any way. Any interactions were possible only between applications of the cluster nodes, each of which contains persistent storage.