Jonathan Ellis | The Blog Pros

October 9, 2023

Five Hard Problems in Vector Search, and How We Solved Them in Cassandra

Vector search is a critical component of generative AI tooling because of how retrieval augmented generation (RAG) like FLARE helps LLMs incorporate up-to-date, customized information while avoiding hallucinations. At the same time, vector search is a feature, not a product — you need to query vectors as they relate to the rest of your data, not in isolation, and you shouldn’t need to build a pipeline to sync the rest of your data with a vector store to do that.

2023 has seen an explosion in vector search products and projects, making selecting among them a serious effort. As you research the options, you’ll need to consider the following hard problems and the different approaches to solving them. Here, I’ll walk you through these challenges and describe how DataStax tackled them for our implementation of vector search for DataStax Astra DB and Apache Cassandra.

September 22, 2023

How AI Helped Us Add Vector Search to Cassandra in Six Weeks Weeks

With the huge demand for vector search functionality that’s required to enable generative AI applications, DataStax set an extremely ambitious goal to add this capability to Apache Cassandra and Astra DB, our managed service built on Cassandra.

Back in April, when I asked our chief vice president of product officer who was going to build it, he said, “Why don’t you do it?”

July 26, 2022

Why Pulsar Beats Kafka for a Scalable, Distributed Data Architecture

The leading open-source event streaming platforms are Apache Kafka and Apache Pulsar. For enterprise architects and application developers, choosing the right event streaming approach is critical, as these technologies will help their apps scale up around data to support operations in production.

Everyone wants results faster. We want applications that know what we want, even before we know ourselves. We want systems that constantly check for fraud or security issues to protect our data. We want applications that are smart enough to react and change plans when faced with the unexpected. And we want those services to be continuously available.

June 29, 2021

Apache Cassandra 4.0: Taming Tail Latencies with Java 16 ZGC

Like so many others in the Apache Cassandra community, I’m extremely excited to see that the 4.0 release is finally here. There are many, many improvements to Cassandra 4.0. One enhancement that is more important than it might look is the addition of support for Java versions 9 and up. This was not trivial, because Java 9 made changes to some internal APIs that the most performance-oriented Java projects like Cassandra relied on (you can read more about this here).

This is a big deal because with Cassandra 4.0, you not only get the direct improvements to performance added by the Apache Cassandra committers, you also unlock the ability to take advantage of seven years of improvements in the JVM (Java Virtual Machine) itself.