Accelerating Similarity Search on Really Big Data with Vector Indexing (Part II)

Many popular artificial intelligence (AI) applications are powered by vector databases, from computer vision to new drug discovery. Indexing, a process of organizing data that drastically accelerates big data search, enables us to efficiently query million, billion, or even trillion-scale vector datasets.

This article is supplementary to the previous blog, "Accelerating Similarity Search on Really Big Data with Vector Indexing," covering the role indexing plays in making vector similarity search efficient and different indexes, including FLAT, IVF_FLAT, IVF_SQ8, and IVF_SQ8H. This article also provides the performance test results of the four indexes. We recommend reading this blog first.

This article provides an overview of the four main types of indexes and continues to introduce four different indexes: IVF_PQ, HNSW, ANNOY, and E2LSH.

The Ultimate Guide to Data Collection in Data Science

In today’s world, data plays a key role in the success of any business. Data produced by your target audience, your competitors, information from the field you work and data your company gains on its own may help you find more customers, analyze your business decisions, reoptimize the business model or escalate to other markets. Data will help you define problems your business can solve and provide better service, specifying precisely your clients' needs. 

According to The McKinsey Global Institute research, data-driven companies are 23 times more likely to acquire customers, six times as likely to retain customers, and 19 times as likely to be profitable.  

What Is Breadth-First Search?

Why Should I Care?

A lot of algorithms are implemented for you as part of your chosen language. That means that they are interesting to learn about, but you'll rarely write them yourself.

Graph traversal algorithms are different. We use graphs all the time, from linking related products in an e-commerce application to mapping relationships between people in a social network.

The Top 5 Big Data Applications in the Healthcare Industry

In this modern era of leveraging technology, the enhancement of healthcare sectors is crucial especially during the pandemic of COVID-19. Technological advancements can either make or break the future of healthcare and can control the second wave of coronavirus. One method which can be acquired to make healthcare more efficient, accurate, and affordable is by utilizing big data. 

Big data has completely revolutionized the way data is analyzed, managed, and leveraged across numerous industries. Noticeable sectors where data analytics is making prominent changes in healthcare. It is estimated that the global big data in the healthcare market will tend to reach $34.27 billion by the year 2022 at a CAGR of 22.07%. Moreover, big data in the healthcare market is expected to bypass the figure of $68.03 billion by the year 2024. 

Intro to Yelp Web Scraping Using Python

Originally published June 17, 2020

Like many programmers who hold degrees that are not even relevant to computer programming, I was struggling to learn coding by myself since 2019 in the hope to succeed in the job. As a self-taught developer, I’m more practical and goal-oriented about things that I’ve learned. This is why I like web scraping particularly, not only it has a wide variety of use cases such as product monitoring, social media monitoring, content aggregation, etc, but also it’s easy to pick up.

What’s Stopping the Democratization of AI?

With companies across industries waking up to the reality that adopting AI isn’t merely an option anymore, the question has shifted to how its adoption and implementation can be simplified. In other words, how does one break down the immensely tall barriers around the complicated world of AI and leverage the undeniable advantages it has to offer in terms of managing the scale and complexity of all the data that’s being gathered through the Internet of Things (IoT) already?

There’s no figment of doubt that it is indeed the need of the hour when every industry is fighting a losing battle with scale — the sheer magnitude of data streaming in from the millions (at times billions) of sensors, tools, and equipment.