jepsen | The Blog Pros

Testing the CP Subsystem with Jepsen

At Hazelcast we take reliability very seriously. With the new CP Subsystem module, Hazelcast has become the first and only IMDG that offers a linearizable distributed implementation of the Java concurrency primitives backed by the Raft consensus algorithm. In addition to well-grounded designs and proven algorithms, reliability also requires a substantial amount of testing. We have been working hard to ensure the validity of our consistency claims.

In this blog post, I'll try to demystify the linearizability semantics of the CP Subsystem and explore our Jepsen test suite. This blog post is the fourth installment of my CP Subsystem blog post series.

February 25, 2019

Lessons Learned From 2+ Years of Nightly Jepsen Tests

Since the pre-1.0 betas of CockroachDB, we've been using Jepsen to test the correctness of the database in the presence of failures. We have re-run these tests every night as a part of our nightly test suite. Last fall, these tests found their first post-release bug. This blog post is a more digestible walkthrough of that discovery (many of the links here point to specific comments in that issue's thread to highlight the most important moments).

Two Years of Jepsen Testing

Running Jepsen tests in an automated fashion has been somewhat challenging. The tests run in a complex environment with network dependencies, spawn multiple cloud VMs, and have many potential points of failure, so they turn out to be rather flaky. If you search our issue tracker for Jepsen failures you'll see a lot of issues, but before the bug we're discussing here, they were all benign — failures of our automation setup, not an inconsistency or bug in CockroachDB itself. By now, though, we've worked out the kinks and have the tests running reliably.

GBase 8a Implementation Guide: Resource Assessment
No categories
1. Disk Storage Space Evaluation The storage space requirements for a GBase cluster are calculated based on the data volume of the business system, the choice of compression algorithm, and the number of cluster replicas. The data volume of a business s... […]
A Look Into Netflix System Architecture
No categories
Ever wondered how Netflix keeps you glued to your screen with uninterrupted streaming bliss? Netflix Architecture is responsible for the smooth streaming experience that attracts viewers worldwide behind the scenes. Netflix's system architecture emphas... […]
High Availability and Disaster Recovery (HADR) in SQL Server on AWS
No categories
High Availability and Disaster Recovery (HADR) play a vital role in maintaining the integrity of data, reducing downtime, and safeguarding against data loss in enterprise database systems. AWS offers a range of HADR options for SQL Server, which levera... […]
Terraform Tips for Efficient Infrastructure Management
No categories
Terraform is a popular tool for defining and provisioning infrastructure as code (IaC), improving consistency, repeatability, and version control. But you need to know how to use it properly to extract maximum value from it as an infrastructure managem... […]
Integration Testing With Keycloak, Spring Security, Spring Boot, and Spock Framework
No categories
In today's security landscape, OAuth2 has become a standard for securing APIs, providing a more robust and flexible approach than basic authentication. My journey into this domain began with a critical solution architecture decision: migrating from bas... […]

Proudly powered by WordPress