Reducing Data Latency With Geographically Distributed Databases

Introduction

Do you ever have those moments where you know you’re thinking faster than the app you’re using? You click something and have time to think “what’s taking so long?” It’s frustrating, to say the least, but it’s an all-too-common problem in modern applications. A driving factor of this delay is latency, caused by offloading processing from the app to an external server. More often than not, that external server is a monolithic database residing in a single cloud region. This article will dig into some of the existing architectures that cause this issue and provide solutions on how to resolve them.

Latency Defined

Before we get ahead of ourselves, let’s define “latency.” In a general sense, latency measures the duration between an action and a response. In user-facing applications, that can be narrowed down to the delay between when a user makes a request and when the application responds to a request. As a user, I don’t really care what is causing the delay resulting in a poor user experience; I just want it to go away. In a typical cloud application architecture, latency is caused by the internet and the time it takes to make requests back and forth from the user’s device and the cloud, referred to as internet latency. There is also processing time to consider; the time it takes to actually execute the request, which is referred to as operational latency. This article will focus on internet latency with a hint of operational latency. If you’re interested in other types of latency, TechTarget has a good deep dive into specifics of the term.

What is Persistent ETL and Why Does it Matter?

If you’ve made it to this blog you’ve probably heard the term “persistent” thrown around with ETL, and are curious about what they really mean together. Extract, Transform, Load (ETL) is the generic concept of taking data from one or more systems and placing it in another system, often in a different format. Persistence is just a fancy word for storing data. Simply put, persistent ETL is adding a storage mechanism to an ETL process. That pretty much covers the what, but the why is much more interesting… 

ETL processes have been around forever. They are a necessity for organizations that want to view data across multiple systems. This is all well and good, but what happens if that ETL process gets out of sync? What happens when the ETL process crashes? What about when one of the end systems updates? These are all very real possibilities when working with data storage and retrieval systems. Adding persistence to these processes can help ease or remove many of these concerns. 

Why You Should Consider Database-as-a-Service

Let’s say you’re kicking off a project, maybe it’s an app, data store, IoT project, etc. No matter what you’re building, you will almost always need a database, the foundation of most applications. While this initial decision is easy, it gets infinitely more complicated from there. 

What kind of database do I need? Where do I put it? How many do I need? Why am I doing this to myself? Ahhhh, this was supposed to be a simple project!