research and analysis | The Blog Pros

May 5, 2022

Accelerating Similarity Search on Really Big Data with Vector Indexing (Part II)

Many popular artificial intelligence (AI) applications are powered by vector databases, from computer vision to new drug discovery. Indexing, a process of organizing data that drastically accelerates big data search, enables us to efficiently query million, billion, or even trillion-scale vector datasets.

This article is supplementary to the previous blog, "Accelerating Similarity Search on Really Big Data with Vector Indexing," covering the role indexing plays in making vector similarity search efficient and different indexes, including FLAT, IVF_FLAT, IVF_SQ8, and IVF_SQ8H. This article also provides the performance test results of the four indexes. We recommend reading this blog first.

This article provides an overview of the four main types of indexes and continues to introduce four different indexes: IVF_PQ, HNSW, ANNOY, and E2LSH.

June 12, 2020

Threat Modelling Tools Analysis 101

Abstract

An interconnected world with an increasing number of systems, products and services relying on the availability, confidentiality, and integrity of sensitive information is vulnerable to attacks and incidents. Unfortunately, the threat landscape expands and new threats, threat agents and attack vectors emerge at all times. Defending against these threats requires that organizations are aware of such threats and threat agents. Threat modeling can be used as part of security risk analysis to systematically iterate over possible threat scenarios.

The motivation for this research came from the constantly growing need to acquire better tools to tackle the broad and expanding threat landscape. One such tool which help to categorize and systematically evaluate the security of a system, product or service, is threat modeling.

January 10, 2020

How SMC Allows You to Perform Advanced Data Collaboration Without Exposing Your Data

Data collaboration is the process of combining datasets together to generate new value from data-driven insights. The datasets being combined can come from different organizations, or they can come from data silos internal to an organization.

A number of use cases are possible through data collaboration: fraud detection, advances in healthcare research, real-world data, cross-selling, churn analysis, etc. However, there are significant blockers in realizing the potential benefits of data collaboration. Some of these blockers are so severe that they can stymie potentially valuable collaborations. The blockers originate from a host of areas — fear of loss of IP (intellectual property), privacy regulations, data residency restrictions, and reputational risk (just to name a few).