Message Throttling Implementation With Buffering

Introduction

Software engineers spend a great deal of their time improving the speed and throughput of their systems. Scalability is also a big concern nowadays, usually tackled by building scale-out capabilities. There are times, however, when we have to slow down the system's rate. It may be due to a limited resource that is very hard (or very expensive) to scale or even a dependency on a third-party service that imposes a fixed rate for billing purposes (i.e., speed tiers). How can you add such throttling capability to a scalable system that may span to hundreds of servers? Furthermore, how do you implement such a bottleneck with proper overflow handling so it can gracefully handle spikes without messages getting lost?

Problem Definition and Constraints

For the purposes of this article, we assume that there is a need to limit a message delivery rate because a downstream provider imposes such a limit. This provider can support higher rates but at an increased cost. Since the upstream clients only occasionally exceed this rate, there is no business justification for upgrading the speed tier. Also, let's assume that the provider will drop any messages arriving at a rate greater than the speed tier rate.