2020-09-08T14:49:01Z

Sync vs. Async Python: What is the Difference?

Posted by Miguel Grinberg under Python.

Have you heard people say that async Python code is faster than "normal" (or sync) Python code? How can that be? In this article I'm going to try to explain what async is and how it differs from normal Python code.

What Do "Sync" and "Async" Mean?

Web applications often have to deal with many requests, all arriving from different clients and within a short period of time. To avoid processing delays it is considered a must that they should be able to handle several requests in parallel, something commonly known as concurrency. I will continue to use web applications as an example throughout this article, but keep in mind that there are other types of applications that also benefit from having multiple tasks done concurrently, so this discussion isn't specific to the web.

The terms "sync" and "async" refer to two ways in which to write applications that use concurrency. The so called "sync" servers use the underlying operating system support of threads and processes to implement this concurrency. Here is a diagram of how a sync deployment might look:

Sync Server

In this situation we have five clients, all sending requests to the application. The public access point for this application is a web server that acts as a load balancer by distributing the requests among a pool of server workers, which might be implemented as processes, threads or a combination of both. The workers execute requests as they are assigned to them by the load balancer. The application logic, which you may write using a web application framework such as Flask or Django, lives in these workers.

This type of solution is great for servers that have multiple CPUs, because you can configure the number of workers to be a multiple of the number of CPUs, and with this you can achieve an even utilization of your cores, something that a single Python process cannot do due to the limitations imposed by the Global Interpreter Lock (GIL).

In terms of disadvantages, the diagram above clearly shows what the main limitation of this approach is. We have five clients, but only four workers. If these five clients send their requests all at the same time, then the load balancer will be able to dispatch all but one to workers, and the request that lost the race will have to remain in a queue while it waits for a worker to become available. So four of the five clients will receive their responses timely, but one of them will have to wait longer for it. The key in making the server perform well is in choosing the appropriate number of workers to prevent or minimize blocked requests given the expected load.

An asynchronous server setup is harder to draw, but here is my best take:

Async Server

This type of server runs in a single process that is controlled by a loop. The loop is a very efficient task manager and scheduler that creates tasks to execute the requests that are sent by clients. Unlike server workers, which are long lived, an async task is created by the loop to handle a specific request, and when that request is completed the task is destroyed. At any given time an async server may have hundreds or even thousands of active tasks, all doing their own work while being managed by the loop.

You may be wondering how is the parallelism between async tasks achieved. This is the interesting part, because an async application relies exclusively on cooperative multitasking for this. What does this mean? When a task needs to wait for an external event, like for example, a response from a database server, instead of just waiting like a sync worker would do, it tells the loop what it needs to wait for and then returns control to it. The loop then is able to find another task that is ready to run while this task is blocked by the database. Eventually the database will send a response, and at that point the loop will consider that first task ready to run again, and will resume it as soon as possible.

This ability for an async task to suspend and resume execution may be difficult to understand in the abstract. To help you apply this to things that you may already know, consider that in Python, one way to implement this is with the await or yield keywords, but these aren't the only ways as you will see later.

An async application runs entirely in a single process and a single thread, which is nothing short of amazing. Of course this type of concurrency takes some discipline, since you can't have a task that holds on to the CPU for too long or else the remaining tasks starve. For async to work, all tasks need to voluntarily suspend and return control to the loop in a timely manner. To benefit from the async style, an application needs to have tasks that are often blocked by I/O and don't have too much CPU work. Web applications are normally a very good fit, in particular if they need to handle large amounts of client requests.

To maximize the utilization of multiple CPUs when using an async server, it is common to create a hybrid solution that adds a load balancer and runs an async server on each CPU, as shown in the following diagram:

Async Server

Two Ways to Do Async in Python

I'm sure you know that to write an async application in Python you can use the asyncio package, which builds on top of coroutines to implement the suspend and resume features that all asynchronous application require. The yield keyword, along with the newer async and await, are the foundation on which the async capabilities of asyncio are built. To paint a complete picture, there are other coroutine-based async solutions in the Python ecosystem, such as Trio, and Curio. There is also Twisted, which is the oldest coroutine framework of all, even predating asyncio.

If you are interested in writing an async web application, there are a number of async frameworks based on coroutines to choose from, including aiohttp, sanic, FastAPI and Tornado.

What a lot people don't know, is that coroutines is just one of the two methods available in Python to write asynchronous code. The second way is based on a package called greenlet that you can install with pip. Greenlets are similar to coroutines in that they also allow a Python function to suspend execution and resume it at a later time, but the way in which they achieve this is completely different, which means that the async ecosystem in Python is fractured in two big groups.

The interesting difference between coroutines and greenlets for async development is that the former requires specific keywords and features of the Python language to work, while the latter does not. What I mean by this is that coroutine-based applications need to be written using a very specific syntax, while greenlet-based applications look exactly like normal Python code. This is very cool, because under certain conditions it enables sync code to be executed asynchronously, something the coroutine-based solutions such as asyncio cannot do.

So what is the equivalent of asyncio on the greenlet side? I know of three async packages based on greenlets: Gevent, Eventlet and Meinheld, though the last one is more a web server than a general purpose async library. All have their own implementation of an async loop, and they provide an interesting "monkey-patching" feature that replaces the blocking functions in the Python standard library, such as those that do networking and threading, with equivalent non-blocking versions implemented on top of greenlets. If you have a piece of sync code that you want to run asynchronously, there is a good chance these packages will let you do it.

You are going to be surprised by this. To my knowledge, the only web framework that has explicit support for greenlets is no other than Flask. This framework automatically detects when you are running on a greenlet web server and adjusts itself accordingly, without any need for configuration. When doing this, you need to be careful to not call blocking functions, or if you do, then use monkey-patching to "fix" those blocking functions.

But Flask isn't the only framework that can benefit from greenlets. Other web frameworks such as Django and Bottle, which have no knowledge of greenlets, can also function asynchronously when paired with a greenlet web server and blocking functions are monkey-patched.

Is Async Faster Than Sync?

There is a widely spread misconception with regards to the performance of sync and async applications. The belief is that async applications are significantly faster than their sync counterparts.

Let me clarify this so that we are all on the same page. Python code runs at exactly the same speed whether it is written in sync or async style. Aside from the code, there are two factors that can influence the performance of a concurrent application: context-switching and scalability.

Context-Switching

The effort that is required to share the CPUs fairly among all the running tasks, which is called context-switching, can affect the performance of the application. In the case of sync applications, this work is done by the operating system and is basically a black box with no configuration or fine tuning options. For async applications, context-switching is done by the loop.

The default loop implementation provided by asyncio, which is written in Python, is not considered to be very efficient. The uvloop package provides an alternative loop that is partly implemented in C code to achieve better performance. The event loops used by Gevent and Meinheld are also written in C code. Eventlet uses a loop written in Python.

A highly optimized async loop is likely more efficient in doing context-switching than the operating system, but in my experience, to be able to see a tangible performance gain you would have to be running at really high levels of concurrency. For most applications, I do not believe the performance difference between sync and async context switches amount to anything significant.

Scalability

What I believe is the source of the myth that async is faster is that async applications often lead to a more efficient use of the CPUs, due to their ability to scale much better and in a more flexible way than sync.

Consider what would happen to the sync server shown in the diagram above if it were to receive one hundred requests all at the same time. This server cannot handle more than four requests at a time, so most of those requests will be in a queue for a while before they can get a worker assigned.

Contrast that with the async server, which would immediately create one hundred tasks (or 25 in each of the four async workers if using the hybrid model). With an async server, all requests would begin processing without having to wait (though to be fair, there may be other bottlenecks down the road that slow things down, such as a limit on the number of active database connections).

If these hundred tasks make heavy use of the CPU, then the sync and async solutions would have similar performance, since the speed at which the CPU runs is fixed, Python's speed of executing code is always the same and the work to be done by the application is also equal. But if the tasks need to do a lot of I/O operations, then the sync server may not be able to achieve high CPU utilization with just four concurrent requests. The async server, on the other side, will certainly be better at keeping the CPUs busy because it runs all hundred requests in parallel.

You may be wondering why can't you run one hundred sync workers, so that the two servers have the same concurrency. Consider that each worker needs to have its own Python interpreter with all the resources associated with it, plus a separate copy of the application with its own resources. The sizes of your server and your application will determine how many worker instances you can run, but in general this number isn't very high. Async tasks, on the other side, are extremely lightweight and all run in the context of a single worker process, so they have a clear advantage.

Keeping all of this in mind, we can say that async could be faster than sync for a given scenario only when:

  • There is high load (without high load there is no advantage in having access to high concurrency)
  • The tasks are I/O bound (if the tasks are CPU bound, then concurrency above the number of CPUs does not help)
  • You look at average number of requests handled per unit of time. If you look at individual request handling times you will not see a big difference, and async may even be slightly slower due to having more concurrent tasks competing for the CPU(s).

Conclusion

I hope this article clears some of the confusion and misunderstandings regarding async code. The two important takeaways that I hope you remember are:

  • An async application will only do better than a sync equivalent under high load.
  • Thanks to greenlets, it is possible to benefit from async even if you write normal code and use traditional frameworks such as Flask or Django.

If you'd like to understand more in detail how asynchronous systems work, check out my PyCon presentation Asynchronous Python for the Complete Beginner on YouTube.

Do you have any lingering questions regarding differences between sync and async? Let me know below in the comments!

20 comments

  • #1 Srikar said 2020-09-08T17:57:30Z

    Hi Miguel,

    "blocking functions are monkey-patched"

    Can you give an example of monkey patching a blocking function? May be a good pointer on how to monkey patch in Flask.

  • #2 Miguel Grinberg said 2020-09-08T18:50:01Z

    @Srikar: You need to look in the documentation of the async library that you are using how to do this monkey-patching. You just need to call a function that they provide.

    For Gevent: http://www.gevent.org/api/gevent.monkey.html For Eventlet: https://eventlet.net/doc/patching.html

  • #3 Simon said 2020-09-08T20:23:26Z

    Whenever I have to use Asyncio or Node.js, I am thankful that I spend most of my work time using Erlang/Elixir.

  • #4 John Markham said 2020-09-09T08:02:19Z

    Really well written and informative. Thank you for sharing and explaining this so well.

  • #5 Michael Dunga said 2020-09-09T12:57:12Z

    Thanks Miguel Grinberg

  • #6 Anand Hemmige said 2020-09-10T03:57:50Z

    Thank you for this article Miguel. I just moved a python sync app off Flask to Sanic, which is Async. One thing I have noticed though is at some point, the number of concurrent requests reaches a threshold and the event loop performance starts to flatten or worse degrade . Have you come across any metrics in uvloop / OS etc to help track where the bottle neck is when that happens ? With Sync apps, its really easy to reason when there is a slowdown, but not so much in Async, and the uvloop/asyncio do not have provide enough visibility into "context switching "

  • #7 marcan said 2020-09-10T04:42:29Z

    Sync solutions are not limited to one worker per process, and it is not true that excess requests have to be processed serially. A sync solution can have multiple threads per worker, and those threads will share the same Python interpreter. Due to the GIL they can't run at the same time, but they can block on I/O and let other threads run. They can also be spawned on demand, like tasks in an async design. This is, conceptually, very similar to the async solution. It is how e.g. gunicorn runs by default (if you don't enable greenlets).

    Really, the tl;dr is that async python and sync python are the same damn ting, except in async python you implement the scheduler in userspace, and in sync python in kernelspace. Especially in the greenlet case where the coding style is the same, you're going to end up running the same code in roughly a similar scheduling pattern, only who does the scheduling changes (which might, or might not, impact performance). With gunicorn+Flask, you can try both approaches by changing a single config option.

  • #8 Mark said 2020-09-10T08:58:00Z

    Thanks for this article!

  • #9 elrond bard said 2020-09-10T09:25:36Z

    Miguel- What do you recommend someone use to multiplex / share data base connections in an async app. This is rarely discussed... and seems like something that comes up day 1 in most app servers. I guess it has to be able to have a pool of connections and farm those out to the various async tasks ?

    Thanks

  • #10 Miguel Grinberg said 2020-09-10T09:55:03Z

    @Anand: Did you look at CPU usage? I actually find async a lot easier to reason about in terms of performance, because everything runs in a single process and thread. If you see that your CPU usage is high, that is a good indication that it is time to add an additional CPU and worker, or else you need to look at your own application's performance. If CPU usage isn't high, but you still have slow response, then I check if your application might be inadvertently using blocking functions. Hope this helps!

  • #11 Miguel Grinberg said 2020-09-10T10:07:04Z

    @marcan: I think you need to re-read what I've written in this article. For example, the part where I say that those green "Server Worker" boxes can be processes, threads, or a combination of both. The assumption that those boxes are processes mapped 1:1 to CPUs is yours, not mine.

    Spanning a new thread or process on demand is certainly possible, but not a good idea if you want performance in a high load scenario, as these operations are expensive, unlike starting a new green thread in an async framework. Most multiprocess and multithreaded servers create a pool of workers in advance to avoid the cost of constantly creating and destroying workers.

    I have no idea what you are saying that gunicorn does by default. The default gunicorn configuration is to use the sync worker, which runs a single request per worker process at a time. You can enable threads if you like, but not when you use the default sync worker.

  • #12 Miguel Grinberg said 2020-09-10T10:10:52Z

    @elrond: In general you are going to find that most database drivers (sync or async) provide some sort of connection pooling feature. A traditional worker process that handles one request at a time does not need to worry about this, but you should have a connection pool if you use a multithreaded or async server.

  • #13 rico1 said 2020-09-10T20:59:00Z

    There's a nice article somewhere about back-pressure. Async enables you to accept many resquest frontline but in real world, not demo, you always run with limitations at some point (you mentioned db cnx for instance). And async makes it harder to spot it. In addition, again in real world stuff, you have to make sure that all your libs are async otherwise you block your loop. And such libs are often younger, smaller userbase, more bugs, more problems for you, more grey hairs, etc. Endpoint we're moving back to sync!

  • #14 Miguel Grinberg said 2020-09-10T22:31:43Z

    @ rico1: Yes, that was actually Armin Ronacher on his blog: https://lucumr.pocoo.org/2020/1/1/async-pressure/. I agree with your assessment, async in Python is not for everyone, it takes some effort and the ecosystem is not mature yet.

  • #15 chenluxin said 2020-09-12T09:25:35Z

    Hi Miguel, thanks for the article. I'm still confused about the concept "loop". What does it loop for ? Does it have some relation with the "for loop" or "while loop"?

  • #16 Miguel Grinberg said 2020-09-12T14:30:34Z

    @chenluxin: It is called "loop", but you can think of it as a task scheduler. Nothing to do with for-loops or while-loops. The async loop is just a piece of code that orchestrates the running of all the tasks. You could say that the async loop loops on the task list, giving each a chance to run.

  • #17 vincent said 2020-09-13T20:29:06Z

    Many thanks Miguel for this great piece!

  • #18 Paroksh said 2020-09-14T05:13:33Z

    Great article. Thanks Miguel for writing this.

  • #19 Tudor Munteanu said 2020-09-14T13:14:57Z

    @chenluxin

    The word "loop" in this context is a synonym to "repeating lifecycle", more than "iteration loop" (for, while). The context looks for tasks, runs it, then another and another, and another, continuously.

  • #20 Felipe said 2020-09-22T20:39:51Z

    Thank you, Miguel, for bringing high quality content and discussions to the community. You are amazing, I am your fan already. Thanks.

Leave a Comment