Increase Your Vector Database Read Throughput with In-Memory Replicas

With its official release, Milvus 2.1 comes with many new features such as in-memory replicas, support for string data type, embedded Milvus, tunable consistency, user authentication, and encryption in transit to provide convenience and a better user experience. Though the concept of the in-memory replica is nothing new to the world of distributed databases, it is a critical feature that can help you boost system performance, increase database read throughput, and enhance the utilization of hardware resources in an effortless way.

Therefore, this post sets out to explain what in-memory replica is and why it is important, and then introduces how to enable this new feature in Milvus, a vector database for AI.

Concepts Related to In-Memory Replica

Before getting to know what in-memory replica is and why it is important, we need to first understand a few relevant concepts, including replica group, shard replica, streaming replica, historical replica, and shard leader. The image below is an illustration of these concepts.

Replica concepts

Replica Group

A replica group consists of multiple query nodes that are responsible for handling historical data and replicas. More specifically, the query nodes in the Milvus vector database retrieve incremental log data and turn them into growing segments by subscribing to the log broker, loading historical data from the object storage, and running hybrid searches between vector and scalar data.

Shard Replica

A shard replica consists of a streaming replica and a historical replica, both belonging to the same shard (i.e., Data manipulation language channel, abbreviated as DML channel in Milvus). Multiple shard replicas make up a replica group. And the exact number of shard replicas in a replica group is determined by the number of shards in a specified collection.

Streaming Replica

A streaming replica contains all the growing segments from the same DML channel. A growing segment keeps receiving the newly inserted data till it is sealed. Technically speaking, a streaming replica should be served by only one query node in one replica.

Historical Replica

A historical replica contains all the sealed segments from the same DML channel. A sealed segment no longer receives any new data and will be flushed to the object storage, leaving new data to be inserted into a freshly created growing segment. The sealed segments of one historical replica can be distributed on several query nodes within the same replica group.

Shard Leader

A shard leader is the query node serving the streaming replica in a shard replica.