How to use Rate Limiting algorithms for data processing pipelines

Valerio Barbera

Hi, I’m Valerio, founder and CTO at Inspector.

You may have already heard of rate limiting associated with API consumption. In this article, I’ll show you a more complex use of this component, using it to coordinate data ingestion pipelines.

Building Inspector I’m learning a lot about data-intensive applications, and pipelines are one of the most critical component of their internal architecture.

The architecture of a data-intensive application can be simplified with the schema below:

Simplified architecture of a data-intensive application

Large, unpredictable volumes of incoming data requires a well designed data processing pipeline to ingest that data without disrupting the entire system at every incoming data spike.

While the Ingestion Node and the Ingestion Pipeline can easily scale horizontally with an “auto scaling” configuration in your cloud platform (Google Cloud, AWS, Azure, or Digital Ocean, provide this feature), or using a modern serverless infrastructure, for the datastore instead it is not so easy.

Databases are often the real bottleneck for data intensive systems because they need to support a big big number of write requests per second.

Write requests can hardly be scaled.

I talked about database scalability in a recent article: https://inspector.dev/how-i-handled-the-scalability-of-the-sql-database-at-inspector/

Yes there are many technologies that claim their ability to “infinite scale”. Think about Elastic, Scilla DB, SingleStore, Rockset, MongoDB, and many many more. Perhaps technically they can do it without problems, but that the costs are compatible with your business constraints is far from obvious.

Here comes the Rate Limiter.

What is rate limiting and how to use it in data processing pipelines?

In Inspector the Rate Limiter protects the datastore from inadvertent or malicious overuse by limiting the rate at which an application can store monitoring data.

Without rate limiting, each application may make a request as often as they like, leading to “spikes” of requests that starve other consumers. Once enabled, rate limiting can only perform a fixed number of write requests per second against the datastore. A rate limiting algorithm helps automate the process.

Extended architecture of a data intensive application

But a moniring system can’t lost data. It would mean generating fake metrics. But at the same time it should be capable to store all data without breaking down the entire system at a resonable costs.

For this reason, requests that exceed limit are not lost, but they are re-scheduled again onto the messages queue, waiting for a time window with free capacity.

Fixed Window

Fixed window algorithm divides the timeline into fixed-size windows and assign a counter to each window.

Each request, based on its arriving time, is mapped to a window. If the counter in the window has reached the limit, requests falling in this window should be rejected.

The current timestamp floor typically defines the windows, so if we set the window size to 1 minute. Then the windows are (12:00:00 – 12:01:00), (12:01:00 – 12:02:00), etc.

Suppose the limit is 2 requests per minute:

Request at 00:00:24 and 00:00:36 increase the window’s counter up to 2. The next request that comes at 00:00:49 is re-scheduled because the counter has exceeded the limit. Then the request comes at 00:01:12 can be served because it belongs to a new window.

There are two main downsides to this algorithm:

Many consumers waiting for a reset window

If a window becomes too busy, the entire capacity can be consumed in a single second, overloading the system (e.g. during peak hour like black Friday sale).

A burst of traffic that occurs near the boundary of a window can result in twice the rate of requests being processed

Suppose, the counter is empty, and 10 requests spikes arrive at 00:00:59, they will be accepted and again a 10 requests spike arrives at 00:01:00 since this is a new window and the counter will be set to 0 for this window. Even these requests will be accepted and sever is now handling 20 requests in a few seconds (not really 10 requests/minute).

Sliding Window

Sliding window counter is similar to fixed window but it smooths out bursts of traffic near the boundary by adding a weighted count in previous window to the count in current window.

Let me show you a real example.

Suppose a new request arrives at “1:15”. To decide, whether we should accept this request or deny it will be based on the approximation.

The current rate will be calculated considering the weighted sum below:

limit = 100 requests/hour

rate = 84 * ((60-15)/60) + 36
     = 84 * 0.75 + 36
     = 99

rate < 100
    hence, the request will be accepted.

Conclusion

As discussed in this article we didn’t use the rate limiting to control the incoming traffic in a public API , but we used it internally to protect the datastore against burst of data.

We started with the fixed window and now we moved to the sliding window algorithm improving the speed at which developers see data available in their dashboard.

Laravel application monitoring

If you found this post interesting and want to drastically change your developers’ life for the better, you can give Inspector a try.

Inspector is an easy to use Code Execution Monitoring tool that helps developers to identify bugs and bottlenecks in their application automatically. Before customers do.

screenshot inspector code monitoring timeline

It is completely code-driven. You won’t have to install anything at the server level or make complex configurations in your cloud infrastructure.

It works with a lightweight software library that you can install in your application like any other dependency. You can try the Laravel package, it’s free.

Create an account, or visit our website for more information: https://inspector.dev/laravel

Conversational Data Collection: Introducing AIForm

One of the more interesting things about building an open-source framework is that the community often knows what to build next before you do. When I started Neuron AI, I had a fairly clear picture in my head of the core primitives: agents, tools, workflows, structured output. What I didn’t fully anticipate was how quickly

April 1, 2026

Neuron AI Now Supports ZAI — The GLM Series Is Worth Your Attention

There’s a pattern I’ve noticed over the past year while working on Neuron AI: the decisions that matter most are rarely about chasing trends. They’re about quietly recognizing something that works, testing it seriously, and integrating it so that other developers can benefit without having to do that work themselves. That’s the honest story behind

March 12, 2026

Maestro: A Customizable CLI Agent Built Entirely in PHP

For a long time, the implicit message from the AI tooling industry has been: if you want to build agents, learn Python. The frameworks, the tutorials, the conference talks, all pointed in the same direction. PHP developers who wanted to experiment with autonomous systems had two options: switch stacks or stitch something together from raw

March 7, 2026

How to use Rate Limiting algorithms for data processing pipelines

What is rate limiting and how to use it in data processing pipelines?

Fixed Window

Sliding Window

Conclusion

Laravel application monitoring

Related Posts

Conversational Data Collection: Introducing AIForm

Neuron AI Now Supports ZAI — The GLM Series Is Worth Your Attention

Maestro: A Customizable CLI Agent Built Entirely in PHP

Company

Supported Frameworks

Resources