-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This commit introduces the asynchronous Pyxis GQL module. It will hopefully allow for a better performance when interfacing with the Pyxis GQL endpoint. The changes include: - the pyxis gql async module itself - unit tests of the module - a documentation page with usage examples - updates to the dependencies
- Loading branch information
1 parent
aafa0d0
commit 89ee85d
Showing
6 changed files
with
1,462 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
===================== | ||
Async Pyxis GQL Usage | ||
===================== | ||
|
||
In this page we will give some examples on how to use the asynchronous Pyxis GQL module | ||
(``pyxis_gql_async.py``). | ||
|
||
General remarks about asynchronous code | ||
======================================= | ||
|
||
To start, let's take a good analogy from Miguel Grinberg’s 2017 PyCon talk about how asynchronous | ||
code works: | ||
|
||
Chess master Judit Polgár hosts a chess exhibition in which she plays multiple amateur players. | ||
She has two ways of conducting the exhibition: synchronously and asynchronously. | ||
|
||
Assumptions: | ||
- 24 opponents | ||
- Judit makes each chess move in 5 seconds | ||
- Opponents each take 55 seconds to make a move | ||
- Games average 30 pair-moves (60 moves total) | ||
|
||
*Synchronous version*: Judit plays one game at a time, never two at the same time, until the | ||
game is complete. Each game takes (55 + 5) * 30 == 1800 seconds, or 30 minutes. The entire | ||
exhibition takes 24 * 30 == 720 minutes, or 12 hours. | ||
|
||
*Asynchronous version*: Judit moves from table to table, making one move at each table. She | ||
leaves the table and lets the opponent make their next move during the wait time. One move | ||
on all 24 games takes Judit 24 * 5 == 120 seconds, or 2 minutes. The entire exhibition is now | ||
cut down to 120 * 30 == 3600 seconds, or just 1 hour. | ||
|
||
Source: [RealPython]_ and [MiguelGrinberg]_ | ||
|
||
Therefore, async code can have a speedup in IO-bound scenarios by "freeing" the execution runtime | ||
to do other things while a certain function waits for a response. This is different than | ||
**parallellizing** and **threading**. | ||
|
||
In async code, it is fundamental that the the functions that are being alternated are | ||
**non-blocking**. Otherwise, the runtime will not be able to alternate between them. | ||
|
||
When calling async functions, we have to create an **async loop**, which contains the | ||
functions that will be alternated. In the analogy, this is similar to setting up the set of tables | ||
and boards for the exhibition. The runtime will alternate between the functions in that context, | ||
and then resume usual synchronous execution. | ||
|
||
Consequently, async functions cannot be run directly, as the usual synchronous python functions | ||
can. Instead, they need to be **awaited** inside some async loop. Python reserves a few special | ||
keywords (``await``, ``async with``, ``async for``) to be used only inside asynchronous functions, | ||
which are created with create ``async def``. | ||
|
||
Finally, if you put only one function inside an async loop, you will make things work as if they | ||
were synchronous, and therefore you will lose the async speedup. This would be analogous to | ||
organizing an exhibition with just one board for Polgár -- it is a standard match. | ||
|
||
If you have never used asynchronous code in python, references [RealPython]_ and [AsyncIO]_ are | ||
good sources. | ||
|
||
|
||
Usage examples | ||
============== | ||
|
||
Basic calls | ||
----------- | ||
Usually, when we write async code, we will rely on other async libraries that implement lower-level | ||
functionality. Our code will then have to ``await`` the async functions of those libraries. For | ||
example, we could await a ``get`` request made with ``aiohttp``: | ||
|
||
.. code-block:: python | ||
:emphasize-lines: 3 | ||
url = "https://docs.aiohttp.org/en/stable/index.html" | ||
async with aiohttp.ClientSession() as session: | ||
response = await session.get(url) | ||
We can than make an async loop with ``asyncio.run``: | ||
|
||
.. code-block:: python | ||
:emphasize-lines: 9 | ||
async def main(): | ||
url = "https://docs.aiohttp.org/en/stable/index.html" | ||
async with aiohttp.ClientSession() as session: | ||
response = await session.get(url) | ||
return response | ||
if __name__ == "__main__": | ||
response = asyncio.run(main()) | ||
print(f"{response}") | ||
Scheduling tasks and collecting results with ``gather`` | ||
------------------------------------------------------- | ||
We can shedule a batch of tasks to run "together" (in the async sense) and then gather their results | ||
with ``asyncio.create_task`` and ``asyncio.gather``: | ||
|
||
.. code-block:: python | ||
task1 = asyncio.create_task(async_function1(...)) | ||
task2 = asyncio.create_task(async_function2(...)) | ||
await asyncio.gather(task1, task2) | ||
This is a simple option for when we just need to collect a set of results, without acting on each | ||
of them separately. | ||
|
||
If you have a variable number of tasks that need to be gathered, you could use a generator: | ||
|
||
.. code-block:: python | ||
:emphasize-lines: 1 | ||
async for obj in my_async_generator: | ||
task = asyncio.create_task(process_obj(obj)) | ||
tasks.append(task) | ||
await asyncio.gather(*tasks) | ||
The discussions in [AsyncFor]_ are very good to check. | ||
|
||
|
||
Chaining results | ||
---------------- | ||
If, on the other hand, you want to process each result as they become available, you can use | ||
``asyncio.as_completed``. In this example, there are a few results that we want to ignore, so we | ||
make a conditional aggregation after awaiting each execution: | ||
|
||
.. code-block:: python | ||
:emphasize-lines: 5 | ||
x_values = [...] | ||
results_to_skip = [...] | ||
collected_results = [] | ||
for f in asyncio.as_completed( | ||
[async_function(x) for x in x_values] | ||
): | ||
result = await f | ||
if result not in results_to_skip: | ||
collected_results.append(result) | ||
Again, the discussions in [AsyncFor]_ are very good to check. | ||
|
||
|
||
Usual structure of async code | ||
----------------------------- | ||
We will usually have the following elements when we use async code: | ||
|
||
- lower level async functions that implement a given task | ||
- async orchestrators, which create an async loop and aggregates several lower level async functions | ||
(for example using ``asyncio.gather`` or ``async for``) | ||
- a synchronous wrapper function, that calls the async orchestrator and integrates it in the context | ||
of the synchronous flow (for example with ``asyncio.run``) | ||
|
||
It might be the case that a given module or package the latter two cases reside in | ||
externally-calling code. For the asynchronous Pyxis GQL module in Freshmaker | ||
(``pyxis_gql_async.py``), this is precisely the case: the module itself provides the PyxisAsyncGQL | ||
class with several async functions; it is up to the other modules that will use it | ||
to build the async loop and aggregate those functions in them as needed. | ||
|
||
Concrete example for using PyxisAsyncGQL | ||
---------------------------------------- | ||
In this next example, we will call ``PyxisAsyncGQL.get_repository_by_path`` several times, each for | ||
a given path and registry, and aggregate each result conditionally. | ||
|
||
.. code-block:: python | ||
async def aggregate_paths(): | ||
path_registry_vals:list[tuple[str, str]] = [...] | ||
results_to_skip = [...] | ||
collected_results = [] | ||
for f in asyncio.as_completed( | ||
[PyxisAsyncGQL.get_repository_by_path(*x) for x in path_registry_vals] | ||
): | ||
result = await f | ||
if result not in results_to_skip: | ||
collected_results.append(result) | ||
def main(): | ||
asyncio.run(aggregate_paths()) | ||
References | ||
========== | ||
.. [RealPython] https://realpython.com/async-io-python/ | ||
.. [MiguelGrinberg] https://youtu.be/iG6fr81xHKA?t=4m29s | ||
.. [AsyncIO] https://docs.python.org/3/library/asyncio-task.html# | ||
.. [AsyncFor] https://stackoverflow.com/questions/56161595/how-to-use-async-for-in-python |
Oops, something went wrong.