Blocking traffic jam / vs async flight

Could technology make an impact on application performance and efficiency? Yes, and there is a key factor that determines this. We will answer this question by comparing typical web apps technologies.

Case 1: IO blocking code execution model

Imagine you have a default web app on Java or Python or single-process PHP without threads/async capabilities. This is still a pretty common approach and used by default on frameworks like Django, Flask, and many many more.

Each request handling code is executed in one process basically. When a request starts doing slow IO operation (for example calling an external API call or DB query) the process is not able to serve the next request. Such apps commonly deployed with special executors like Gunicorn/uWSGI/Tomcat which spawn several processes called workers (e.g. 3 workers for servers with 1 CPU core) or several Threads.

Now imagine a request does query which selects data from DB. If the database is not correctly mastered such requests might take, let's say ~990ms which increases with every new item in the table. Other code (framework calls, data movements, regexes, and so on could take e.g. 10 ms and are stable).

And the same scenario happens pretty often. I would say super often. Local development instance works super fast but when more and more users come to service and more data appears after a month of life in production it might happen.

If we would visualize it on the time scale, it would look like this:

Image for a hint

Lime time is when your process really uses CPU/disk and so on. Redline is when the Database uses CPU, and your process just waits for a response. But it still prevents other requests to be executed unless the current one is done.

The worse thing I saw, the developer starts to optimize python code in this case, Googles how to write loops faster which function to apply. He spends hours and saves 2ms of request handling code:

Image for a hint

Do you see a difference?.. Me too. Always find the scale of the issue and make sure you are fixing the real bottleneck.

So if only one process is serving web API it is now capable to handle each request spending one second. This means the maximum payload is 1 RPS (Request per Second). If two users will make requests in the parallel first one would wait 1 second and the second one will wait 2 seconds.

If your HTTP runner configured to spawn in 3 processes or 3 threads then you will bump performance up to 3 RPS:

Image for a hint

This looks sad assuming that fact that users of most services getting inside by Erlang distribution law (most time they are sleeping, but during dinner, they all visit your site with memes, at the beginning of they all check email, at the afternoon they watch streams on your streaming platform)

Case 2: async IO code model

Imagine you are using async stack like NodeJS + Sequalize ORM / Prisma ORM / etc or Python's asyncio with aiohttp / FastAPI / Gino ORM. In this case, each IO request e.g. query to DB still executed long (990ms), but during the red waiting for a response, your one process could start handling the next request, which again might call DB:

Image for a hint

This is much more efficient in terms of your app because only one process/thread will not be blocked with any number of parallel requests without extra RAM/CPU usage. For example, apart from slow stateful requests (that involve unoptimized DB calls), you might have fast stateless requests which will serve stateless answers and they will be served quickly.

But the actual problem is still there. The 990ms are actually hard CPU work for the database servers. And now the Database server itself is the bottleneck. Some queries might be handled in parallel without hurt of performance, another might be queued (means executed one after another) or take whole server RAM so Linux's OOM killer will start killing random processes on your server making bad consequences for a runtime. Also, don't forget about the maximum number of connections that can be created from your application into the database. In real life, it is commonly only a couple of connections so after this limit requests will be bottlenecked. So yeah, in the end, this approach will not boost RPS anyway if you have slow DB.

However even in this case, you can see that async technologies might dramatically increase responses delay time, also they bring minimal RAM overhead comparing to threads or even worsen multi processes.

We recommend you to check the great talk from Python Core contributor: High performance networking in Python (Yury Selivanov)