Boosting Similarity Search With Stream Processing

The goal of similarity search and vector databases is to find similar results to the search query for unstructured data, such as text, images, and videos. The unstructured data first is vectorized, and stored in a vector format. There are publicly available tools to create vectors from unstructured data; similarly, there are vector databases to store and perform similarity searches. This is important because of the rising popularity of Large Language Models (LLMs) and their combination with vector databases.

Here, we present a hybrid approach by taking the strengths of vector databases and boosting them with traditional search and filtering techniques based on real-time stream processing. Vector databases are good for building high-performance vector search applications. On the other hand, Hazelcast can be used for real-time stream processing and fast data storage for structured data (filters, tags, and contextual data). Some vector databases offer filtering on structural data, which can be used or replaced with Hazelcast. In either case, Hazelcast can be used to enrich your query results, from additional resources.

CategoriesUncategorized