Throttling Made Easy: Back Pressure in Akka Streams

Big data is the buzzword all over lately, but fast data is also gaining traction. If you are into data streaming, then you know it can be tedious if not done right and may result in data leaks/OutOfMemory exceptions. If you are building a service or product today, users are willing to pay lots of money to those who provide content with latency of just milliseconds.

Akka Streams

Akka Streams are a streaming module that is part of the Akka toolkit, designed to work with huge data streams to achieve concurrency in a non-blocking way by leveraging Akka toolkit's power without defining actor behaviors and methods explicitly. They also help to conceal the abstraction by ignoring what is going under the hood and help you focus on the logic needed for business.

Throttle Spark-Kafka Streaming Volume

This article will help any new developer who wants to control the volume of Spark Kafka streaming.

A Spark streaming job internally uses a micro-batch processing technique to stream and process data. The initial state of the job will be in the "queued" status, then it will then move to the "processing" status, and then it is marked with the "completed" status.

Message Throttling Implementation With Buffering

Introduction

Software engineers spend a great deal of their time improving the speed and throughput of their systems. Scalability is also a big concern nowadays, usually tackled by building scale-out capabilities. There are times, however, when we have to slow down the system's rate. It may be due to a limited resource that is very hard (or very expensive) to scale or even a dependency on a third-party service that imposes a fixed rate for billing purposes (i.e., speed tiers). How can you add such throttling capability to a scalable system that may span to hundreds of servers? Furthermore, how do you implement such a bottleneck with proper overflow handling so it can gracefully handle spikes without messages getting lost?

Problem Definition and Constraints

For the purposes of this article, we assume that there is a need to limit a message delivery rate because a downstream provider imposes such a limit. This provider can support higher rates but at an increased cost. Since the upstream clients only occasionally exceed this rate, there is no business justification for upgrading the speed tier. Also, let's assume that the provider will drop any messages arriving at a rate greater than the speed tier rate.

Detailed Explanation of Guava RateLimiter’s Throttling Mechanism

Throttling is one of the three effective methods for protecting a high concurrency system. The other two are respectively caching and downgrading. Throttling is used in many scenarios to limit the concurrency and the number of requests. For example, in the event of flash sales, throttling protects your own system and the downstream system from being overwhelmed by tremendous amounts of traffic.

The purpose of throttling is to protect the system by restricting concurrent access or requests or restricting requests of a specified time window. After the threshold is exceeded, denial of service or traffic shaping is triggered.