Aperture in Action: How We Solved PostgreSQL Performance Challenges

Even thirty years after its inception, PostgreSQL continues to gain traction, thriving in an environment of rapidly evolving open-source projects. While some technologies appear and vanish swiftly, others, like the PostgreSQL database, prove their longevity, illustrating that they can withstand the test of time. It has become the preferred choice by many organizations for data storage, from general data storage to an asteroid tracking database. Companies are running PostgreSQL clusters with petabytes of data.

Operating PostgreSQL on a large scale in a production environment can be challenging. Companies have experienced downtime and performance problems, resulting in financial losses and diminished trust, especially if the outages extend beyond a few hours. A case in point is the GitLab database outage in Jan 2017. Though there were many attributes to how this outage happened, to emphasize how overload can play a significant role, in their timeline, they explained how much time it took to control overload happening at that time, which cost them hours to control it.

Integrating FluxNinja Aperture With Nginx for Effective Load Management

Today, everything is available online, and people tend to turn to the internet for even the smallest things. We see new products and websites popping up every day, catering to specific needs, from groceries to online studying. This leads to an increasing number of users coming online to use these services, leading to a surge in traffic on websites and web applications.

When launching a product or website, we often have estimates, but sometimes these estimates are exceeded, leading to overload scenarios. For instance, after the announcement of ChatGPT 3.5, there was a massive influx of traffic and interest from people all around the world. The sudden surge of visitors surpassed their limits and buffers, leading to website downtime. In such situations, it is essential to have load management in place to avoid possible business loss.

Load Management With Istio Using FluxNinja Aperture

Service meshes are becoming increasingly popular in cloud-native applications as they provide a way to manage network traffic between microservices. Istio, one of the most popular service meshes, uses Envoy as its data plane. However, to maintain the stability and reliability of modern web-scale applications, organizations need more advanced load management capabilities. This is where Aperture comes in. It offers several features, including:

  • Prioritized load shedding: Drops traffic that is deemed less important to ensure that the most critical traffic is served.
  • Distributed rate-limiting: Prevents abuse and protects the service from excessive requests.
  • Intelligent autoscaling: Adjusts resource allocation based on demand and performance.
  • Monitoring and telemetry: Continuously monitors service performance and request attributes using an in-built telemetry system.
  • Declarative policies: Provides a policy language that enables teams to define how to react to different situations.

These capabilities help manage network traffic in a microservices architecture, prioritize critical requests, and ensure reliable operations at scale. Furthermore, the integration with Istio for flow control is seamless and without the need for application code changes.

Implementing Adaptive Concurrency Limits

Highly available and reliable Services are a hallmark of any thriving business in today’s digital economy. As a Service owner, it is important to ensure that your Services stay within SLAs. But when bugs make it into production or user traffic surges unexpectedly, services can slow down under a large volume of requests and fail. If not addressed in time, such failures tend to cascade across your infrastructure, sometimes resulting in a complete outage.

At FluxNinja, we believe that adaptive concurrency limits are the most effective way to ensure services are protected and continue to perform within SLAs.

GitOps Using Flux and Flagger

GitOps as a practice has been in use since 2017 when Alexis Richardson coined the term. It transformed DevOps and automation. If you look at its core principles, it extends DevOps by treating Infrastructure as Code (IaC). Your deployment configuration is stored in a version control system (a.ka. Git), providing a single source of truth for both dev and ops.

As the framework’s adoption increased, GitOps became the standard for continuous deployment in the cloud native space. Many agile teams adopt GitOps because of familiarity with git-based workflow for the release management of cloud-native workloads.