distributed architecture

April 4, 2022

Using PGBouncer With CockroachDB Serverless

Given CockroachDB scales with vCPU, there's a hard limit to how many active connections we can support per vCPU before a serious problem arises. PGBouncer stretches the limits a bit making it a cost-effective option. In serverless architectures, there is no client-side connection pooling, and using middleware like PGBouncer can alleviate the problem of connection storms. Please see my previous articles on the topic for more details.

Motivation

We've covered how to deploy PGBouncer with a self-hosted CockroachDB cluster. Today, I'm going to demonstrate how to run PGBouncer along with the Cockroach Cloud free-forever tier database. The overall concepts are identical, but we will highlight some of the major differences in deploying PGBouncer with a cloud product.

March 16, 2022

How To Set Up a Scalable and Highly-Available GraphQL API in Minutes

A modern GraphQL API layer for cloud-native applications needs to possess two characteristics: horizontal scalability and high availability.

Horizontal scalability adds more machines to your API infrastructure, whereas vertical scalability adds more CPUs, RAM, and other resources to an existing machine that runs the API layer. While vertical scalability works to a certain extent, the horizontally scalable API layer can scale beyond the capacity of a single machine.

March 16, 2021

Why Java Is so Young After 25 Years: An Architect’s Point of View

Java completes 25 years of programming life and remains closer to developer's minds; almost 69% of the worldwide developer community remains to code in Java, even now. Oracle recently released Java 15 with tonnes of features like Sealed Classes, Hidden Classes, Edward-Curve Digital Signature Algorithm (EdDSA), Text Blocks to name a few. This makes Java 15 to be 25 years young and not 25 years old programming language.

History and Evolution of Java

When there were dozens of programming language which were very stable in the early 1990s like FORTRAN, COBOL, Pascal, C++, and Visual Basic, many platforms like Windows, Mac, Unix, Linux, and Mobile platforms demanded a unified approach in program development and architecture design. James Gosling and his friends discussed these aspects under an Oak tree near James' office where they felt a new programming language to be developed to address these gaps. They were very particular on foundational aspects in developing a new language called Oak, which was then named as Green (as the team is named as Green team) and later Java (based on their favorite coffee from Indonesia called Java Coffee).

March 12, 2021

Reducing Data Latency With Geographically Distributed Databases

Introduction

Do you ever have those moments where you know you’re thinking faster than the app you’re using? You click something and have time to think “what’s taking so long?” It’s frustrating, to say the least, but it’s an all-too-common problem in modern applications. A driving factor of this delay is latency, caused by offloading processing from the app to an external server. More often than not, that external server is a monolithic database residing in a single cloud region. This article will dig into some of the existing architectures that cause this issue and provide solutions on how to resolve them.

Latency Defined

Before we get ahead of ourselves, let’s define “latency.” In a general sense, latency measures the duration between an action and a response. In user-facing applications, that can be narrowed down to the delay between when a user makes a request and when the application responds to a request. As a user, I don’t really care what is causing the delay resulting in a poor user experience; I just want it to go away. In a typical cloud application architecture, latency is caused by the internet and the time it takes to make requests back and forth from the user’s device and the cloud, referred to as internet latency. There is also processing time to consider; the time it takes to actually execute the request, which is referred to as operational latency. This article will focus on internet latency with a hint of operational latency. If you’re interested in other types of latency, TechTarget has a good deep dive into specifics of the term.

February 19, 2021

Geo-Distributed Data Lakes Explained

Geo-Distributed Data Lake is quite the mouthful. It’s a pretty interesting topic and I think you will agree after finishing this breakdown. There is a lot to say about how awesome it is to combine the flexibility of a data lake with the power of a distributed architecture, but I’ll get more into the benefits of both as a joint solution later. To start, I want to look at geo-distributed data lakes in two parts before we marry them together, for my non-developer brain that made the most sense! No time to waste, let’s kick things off with the one and only… data lakes.

It’s a Data LAKE, Not Warehouse!

It shouldn’t be a shock to the system to point out that we are living in a data-driven world going into 2021. Because of this, 'data lakes' are a fitting term for the amount of data companies are collecting. In my opinion, we could probably start calling them data oceans, expansive and seemingly never-ending. So what is a data lake exactly?

February 17, 2021

The Theory and Motive Behind Active/Active Multi-Region Architectures

The date was 24th December 2012, Christmas eve. The world’s largest video streaming service, Netflix experienced one of its worst incidents in company history. The incident was an outage of video playback on TV devices for customers in Canada, the United States, and the LATAM region. Fortunately, the enduring efforts of responders over at Netflix, along with AWS where the Amazon Elastic Load Balancer service experiencing disruptions resulting in the cause of the incident, managed to restore services just in time for Christmas. If one were to think about the events that ensued over at Netflix and AWS that day, it would be comparable to all those movies of saving Christmas that we all love to watch around that time of year.

This idea of incident management comes from the ubiquitous fact that incidents will happen. This is not an unknown fact and best immortalized by Amazon VP and CTO Werner Vogels when he said “Everything fails all the time”. It is, therefore, understood that things will break but the question that persists is can we do anything to mitigate the impact of these inevitable incidents? The answer is of course yes.

May 11, 2020

JobRunr + Kubernetes + Terraform

In this new tutorial, we will build further upon on our first tutorial — Easily process long-running jobs with JobRunr — and deploy the JobRunr application to a Kubernetes cluster on the Google Cloud Platform (GCP) using Terraform. We then scale it up to 10 instances to have a whopping 869% speed increase compared to only one instance!

This tutorial is a beginners guide on the topic cloud infrastructure management. Feel free to skip to the parts that interest you.

Kubernetes, also known as k8s, is the hot new DevOps tool for deploying high-available applications. Today, there are a lot of providers all supporting Kubernetes including the well known Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), and Amazon Elastic Kubernetes Service (EKS).

January 13, 2020

Getting Started With DbSchema on a Distributed SQL Database

If you’re a database developer, you know the time-saving value of being able to visually design, document and query SQL and NoSQL databases from a single UI. DbSchema is a well-rounded, visual database tool that supports over 40 databases from a single interface. And because YugabyteDB is PostgreSQL compatible, getting DBSchema to work with a distributed SQL database is relatively simple.

In this post, we’ll show you how to get DbSchema connected to a YugabyteDB cluster so you can start reverse-engineering schemas, edit ER diagrams, browse data, visually build queries and even sync schemas.

October 11, 2019

Distributed and Guaranteed Executor Service

Sometimes the start is the most difficult part.

You may also like: Using Java's Future and ExecutorService

If you have ever implemented an async API/Event listener then the following would have been the typical implementation

High-Level Diagram of the Implementation

August 23, 2019

What Is Recon? How We Augmented XML and JSON For Streaming Data

Ever since applications started moving data records, we’ve needed ways to annotate those records with formatting instructions. Many of these record notation formats are familiar to developers. For example, according to the IETF, JSON “defines a small set of formatting rules for the portable representation of structured data.” In practical terms, JSON makes it possible to describe value pairs, arrays, or a series of values as a human-readable document.

Similarly, the prolific XML markup language makes it possible to encode data into a format that is both human and machine-readable. Without formatting instructions provided by JSON and XML, machines would lack the context necessary to express and analyze documents. However, what happens when data cannot be expressed as a document?

June 28, 2019

Build a Simple Chat App Using Java and Stateful Web Agents

Even the most simple chat user interfaces bely a world of architectural complexity. Features like authentication, user presence, chat rooms, user counts, message encryption, and countless others represent a significant undertaking. However, with the right tools, building an enterprise-scale chat application is not only possible, it can be done relatively quickly.

This post is a tutorial for building a basic chat application using the open source Swim platform. The app we’ll be referencing was built by Scott Clarke, a UI developer at Swim, and the source code is available on GitHub here. Because this chat application is intended to demonstrate Swim development patterns, as opposed to being a usable product, we have not included features like authentication or compressive user state tracking. While we do include user presence in the chat app, we took the simplest approach possible and just display a user’s local IP address. This app may be simple, but the same patterns we demonstrate here can be used to build a massively scalable version, and can easily be integrated with authentication services or other third-party software.

June 10, 2019

How to Build Scalable, Stateful Java Services in Under 15 Minutes

Five years ago when I started tracking media buzz around stateful architectures, I’d see a few articles every month about running stateful containers. That's about when Caitie McCaffrey first shared this awesome presentation about building scalable stateful architectures. Since then, the dominant software paradigm has become functional application design. The actor model and other object-oriented paradigms are still in use, but database-centric RESTful architectures are standard means of building web applications today.

However, the tides are beginning to shift. Due to innovations like the blockchain, growing demand for real-time applications, the digitization of OT assets, and the proliferation of cheap compute resources at the network edge; there’s renewed interest in decentralized application architectures. As such, there’s also been increased focus on stateful applications. For example, at least five Apache Foundation projects (Beam, Flink, Spark, Samza, and TomEE) are touting statefulness as a benefit today. Modern applications communicate across multiple application silos and must span real-world machines, devices, and distributed data centers around the world. Stateful application architectures provide a way to abstract away the logistical effort of state management, thereby reducing development and management effort necessary to operate massive-scale distributed applications.

Build a Scalable, Stateful To-Do List in 15 Minutes or Less

For the rest of this post, I want to disprove the notion that building scalable, stateful applications is a task too complex for everyday Java developers. In order to illustrate how easily a stateful application can be setup, we’ll walk through a tutorial for building a simple to-do list using the Swim platform. You can find all the source code for the to-do list tutorial here on GitHub.

May 30, 2019

Dubbo 3.0 Preview: Support for Reactive Programming

Background

Dubbo is graduating from the Apache foundation! And we are planning some major release milestones. Release 3.0 is on the way. Since there will be many new features in 3.0, we want to make sure they are up to the expectation of the community. We are now offering the 3.0.0-SNAPSHOT version. It has many preview features of the 3.0 release. In this article, I’ll introduce one of the major enhancements: the support for reactive programming.

RSocket

Reactive programming enables developers to write more efficient applications, especially in a distributed architecture. The community has been asking for this feature for a long time and now we are delivering!

April 12, 2019

Hystrix vs. Sentinel: A Tale of Two Circuit Breakers (Part 2)

In the last blog, we compared the two libraries at a high level. Now, we are going to see how they are used with some code examples.

Bookstore Example

The example used here is from this Spring tutorial. This is a famous bookstore sample app.

April 9, 2019

Overview of Common Data-Keeping Techniques Used in a Distributed Environment

This article summarizes a very high-level overview of the common data handling techniques used in distributed environments along with some of their key points and advantages.

Normalization

Remember those old days of RDBMS where we used to organize the associative set of columns in the same table with a foreign key as referential entities, mostly to reduce the redundancy of data across different tables? For example, instead of putting 'employee_ name' column in employee's personal_detail table and address_detail table, we used to keep it in personal_details only, whereas 'emp_id' can be a foreign key in the address_detail table.

Category: distributed architecture