Making Dropwizard Metrics Accessible via CQL in Apache Cassandra

Metrics are a vital part of complex distributed storage systems, such as Apache Cassandra. It's important for an operator and a user to have access to metrics at the OS, JVM, and application levels to have full control over the data that is being processed and to prevent emergencies before they occur.

To make metrics accessible, Cassandra heavily relies on the open-source Dropwizard Metrics library, which acts as a skeleton for both metrics representation and storage. Metrics representations are provided as Histogram, Timer, Meter, Gauge, etc. classes for metric types, while storage uses MetricRegistry. The Dropwizard library makes it easy to expose the database internals through various APIs, like JMX or REST, in addition to the sidecar pattern. Apache Casandra has a vibrant ecosystem in this regard, for example, you can write your java-agent to export all data from the registry to the collectd Unix daemon. In conjunction, Cassandra's virtual tables, which are a relatively recent development by the project's standards (available since 4.0), have only a fraction of all the metrics so far, so don't give a full view of internal processes and need to be improved to rectify this.

Bring Streaming to Apache Cassandra with Apache Pulsar

Twitch, YouTube, Instagram, Facebook — virtually every major brand nowadays uses live streaming to connect and engage their audience. For enterprises and developers building cloud-native applications, this growing trend creates a need for streaming technologies that can reliably handle the rush of massive amounts of data, while also being flexible and easy to manage for developers.

One such technology is Apache Pulsar® — an open-source, distributed messaging and streaming platform that’s easy to deploy, simple to scale, and packed with developer-friendly APIs. So the next question is: how can you stream from Pulsar to Apache Cassandra®, the powerful NoSQL database designed to support data-heavy applications in the cloud?

Join our beginner-friendly Pulsar workshop on YouTube and learn how to connect Pulsar with Cassandra for streaming! In this post, we’ll set the scene with an introduction to Pulsar and guide you through four hands-on exercises where you’ll use these free, cloud-native technologies: Katacoda, Kesque, GitPod, and DataStax Astra DB. Each exercise will also be linked to the step-by-step instructions on the DataStax Developers GitHub wiki.

The End of the Beginning for Apache Cassandra

Editor’s note: This story originally ran on July 27, 2021, the day that Apache Cassandra 4.0 was released.

Today is a big day for those of us in the Apache Cassandra community. After a long uphill climb, Apache Cassandra 4.0 has finally shipped. I say finally because it has at times seemed like an elusive goal. I’ve been involved in the Cassandra project for almost 10 years now and I have seen a lot of ups and downs. So I feel this day marks an important milestone that isn’t just a version number. This is an important milestone in the lifecycle of a database project that has come into its own as an important database used around the world. The 4.0 release is not only incredibly stable in the history of Cassandra, but it’s also quite possibly the most stable release of any database. Now it’s ready to launch into the next 10 years of cloud-native data; it has the computer science and hard-won history to make a huge impact. Today’s milestone is the end of the beginning.

Improving Apache Cassandra’s Front Door and Backpressure

We have improved Apache Cassandra's ability to handle high throughput workloads, while having enough safeguards in place to protect itself from potentially going out of memory. In order to better explain the change we have made, let us understand at a high level, on how an incoming request is processed by Cassandra before the fix, followed by what we changed, and the new relevant configuration knobs available.

How Inbound Requests Were Handled Before

Let us take the scenario of a client application sending requests to C* cluster. For the purpose of this blog, let us focus on one of the C* coordinator nodes.

Build Fault Tolerant Applications With Cassandra API for Azure Cosmos DB

Azure Cosmos DB is a resource governed system that allows you to execute a certain number of operations per second based on the provisioned throughput you have configured. If clients exceed that limit and consume more request units than what was provisioned, it leads to rate limiting of subsequent requests and exceptions being thrown — they are also referred to as 429 errors.

With the help of a practical example, I’ll demonstrate how to incorporate fault-tolerance in your Go applications by handling and retrying operations affected by these rate limiting errors. To help you follow along, the sample application code for this blog is available on GitHub — it uses the gocql driver for Apache Cassandra.

Setup Postgres, and GraphQL API With Hasura on Azure

monitors

I created a data model to store railroad systems, services, scheduled, time points, and related information, detailing the schema "Beyond CRUD n' Cruft Data-Modeling" with a few tweaks. The original I'd created for Apache Cassandra, and have since switched to Postgres giving the option of primary and foreign keys, relations, and the related connections for the model.

In this post I'll use that schema to build out infrastructure as a code solution with Terraform, utilizing Postgres and Hasura (OSS).

Designing Microservices With Cassandra

As a thriving software development technique, microservices — and its underlying architecture — remain foundational to cloud-native applications. Apache Cassandra is a natural complement given that it's a database designed for the cloud. This Refcard examines the benefits of microservices architecture, demonstrates recommended data modeling techniques, and explains key microservice design principles for Cassandra using a sample hotel application.

Apache Cassandra

Distributed non-relational database Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors and is used at some of the most well-known, global organizations. This Refcard covers data modeling, Cassandra architecture, replication strategies, querying and indexing, libraries across eight languages, and more.

Tuning Consistency With Apache Cassandra

One of the challenges faced by distributed systems is how to keep the replicas consistent with each other. Maintaining consistency requires balancing availability and partitioning. Fortunately, Apache Cassandra lets us tune this balancing according to our needs. In this blog, we are going to see how we can tune consistency levels during reads and writes to achieve faster reads and writes.

Before digging more about consistency, let me first discuss CAP Theorem. CAP Theorem describes the tradeoffs in distributed systems; it states that any networked shared-data system can have at most two of three desirable properties:

Kubernetes Logs Analysis With Elassandra, Fluent-Bit and Kibana

Elassandra simplifies your data stack by combining the power of Elasticsearch and Apache Cassandra into a single unique solution.

No Single Point Of Failure

Elasticsearch is by design a sharded master-slave architecture. The Elasticsearch master node manages the mapping changes and only the primary shards take write operations. The replica shards are read-only and can be promoted to primary shards by the master node in case of failure or maintenance. By relying on Apache Cassandra, Elassandra is master-less and has no single point of write. All nodes can process the search requests, request a mapping update, and depending on the Cassandra replication factor, take write operations.

How to Run Apache Cassandra on Kubernetes

With Kubernetes’ popularity skyrocketing and the adoption of Apache Cassandra growing as a NoSQL database well-suited to matching the high availability and scalability needs of cloud-based applications, it should be no surprise that more developers are looking to run Cassandra databases on Kubernetes. However, many devs are finding that doing so is relatively simple to get going with, but considerably more challenging to execute at a high level.

On the positive side, Kubernetes helpfully offers StatefulSets — workload API objects that can be used to manage stateful applications. StatefulSets provide the requisite components to establish stable and unique network identifiers, stable persistent storage, smooth and ordering deployment and scaling (as well as deletion and termination), and automated rolling updates.