Skip to content

Commit

Permalink
Asynchronous Pyxis GQL module
Browse files Browse the repository at this point in the history
This commit introduces the asynchronous Pyxis GQL module. It will
hopefully allow for a better performance when interfacing with the
Pyxis GQL endpoint.

The changes include:
- the pyxis gql async module itself
- unit tests of the module
- a documentation page with usage examples
- updates to the dependencies
  • Loading branch information
FernandesMF committed May 29, 2023
1 parent aafa0d0 commit 89ee85d
Show file tree
Hide file tree
Showing 6 changed files with 1,462 additions and 7 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ available that address those CVEs.
configuration
messaging_api
db
pyxis_gql_async

.. _Freshmaker: https://github.com/redhat-exd-rebuilds/freshmaker
188 changes: 188 additions & 0 deletions docs/pyxis_gql_async.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
=====================
Async Pyxis GQL Usage
=====================

In this page we will give some examples on how to use the asynchronous Pyxis GQL module
(``pyxis_gql_async.py``).

General remarks about asynchronous code
=======================================

To start, let's take a good analogy from Miguel Grinberg’s 2017 PyCon talk about how asynchronous
code works:

Chess master Judit Polgár hosts a chess exhibition in which she plays multiple amateur players.
She has two ways of conducting the exhibition: synchronously and asynchronously.

Assumptions:
- 24 opponents
- Judit makes each chess move in 5 seconds
- Opponents each take 55 seconds to make a move
- Games average 30 pair-moves (60 moves total)

*Synchronous version*: Judit plays one game at a time, never two at the same time, until the
game is complete. Each game takes (55 + 5) * 30 == 1800 seconds, or 30 minutes. The entire
exhibition takes 24 * 30 == 720 minutes, or 12 hours.

*Asynchronous version*: Judit moves from table to table, making one move at each table. She
leaves the table and lets the opponent make their next move during the wait time. One move
on all 24 games takes Judit 24 * 5 == 120 seconds, or 2 minutes. The entire exhibition is now
cut down to 120 * 30 == 3600 seconds, or just 1 hour.

Source: [RealPython]_ and [MiguelGrinberg]_

Therefore, async code can have a speedup in IO-bound scenarios by "freeing" the execution runtime
to do other things while a certain function waits for a response. This is different than
**parallellizing** and **threading**.

In async code, it is fundamental that the the functions that are being alternated are
**non-blocking**. Otherwise, the runtime will not be able to alternate between them.

When calling async functions, we have to create an **async loop**, which contains the
functions that will be alternated. In the analogy, this is similar to setting up the set of tables
and boards for the exhibition. The runtime will alternate between the functions in that context,
and then resume usual synchronous execution.

Consequently, async functions cannot be run directly, as the usual synchronous python functions
can. Instead, they need to be **awaited** inside some async loop. Python reserves a few special
keywords (``await``, ``async with``, ``async for``) to be used only inside asynchronous functions,
which are created with create ``async def``.

Finally, if you put only one function inside an async loop, you will make things work as if they
were synchronous, and therefore you will lose the async speedup. This would be analogous to
organizing an exhibition with just one board for Polgár -- it is a standard match.

If you have never used asynchronous code in python, references [RealPython]_ and [AsyncIO]_ are
good sources.


Usage examples
==============

Basic calls
-----------
Usually, when we write async code, we will rely on other async libraries that implement lower-level
functionality. Our code will then have to ``await`` the async functions of those libraries. For
example, we could await a ``get`` request made with ``aiohttp``:

.. code-block:: python
:emphasize-lines: 3
url = "https://docs.aiohttp.org/en/stable/index.html"
async with aiohttp.ClientSession() as session:
response = await session.get(url)
We can than make an async loop with ``asyncio.run``:

.. code-block:: python
:emphasize-lines: 9
async def main():
url = "https://docs.aiohttp.org/en/stable/index.html"
async with aiohttp.ClientSession() as session:
response = await session.get(url)
return response
if __name__ == "__main__":
response = asyncio.run(main())
print(f"{response}")
Scheduling tasks and collecting results with ``gather``
-------------------------------------------------------
We can shedule a batch of tasks to run "together" (in the async sense) and then gather their results
with ``asyncio.create_task`` and ``asyncio.gather``:

.. code-block:: python
task1 = asyncio.create_task(async_function1(...))
task2 = asyncio.create_task(async_function2(...))
await asyncio.gather(task1, task2)
This is a simple option for when we just need to collect a set of results, without acting on each
of them separately.

If you have a variable number of tasks that need to be gathered, you could use a generator:

.. code-block:: python
:emphasize-lines: 1
async for obj in my_async_generator:
task = asyncio.create_task(process_obj(obj))
tasks.append(task)
await asyncio.gather(*tasks)
The discussions in [AsyncFor]_ are very good to check.


Chaining results
----------------
If, on the other hand, you want to process each result as they become available, you can use
``asyncio.as_completed``. In this example, there are a few results that we want to ignore, so we
make a conditional aggregation after awaiting each execution:

.. code-block:: python
:emphasize-lines: 5
x_values = [...]
results_to_skip = [...]
collected_results = []
for f in asyncio.as_completed(
[async_function(x) for x in x_values]
):
result = await f
if result not in results_to_skip:
collected_results.append(result)
Again, the discussions in [AsyncFor]_ are very good to check.


Usual structure of async code
-----------------------------
We will usually have the following elements when we use async code:

- lower level async functions that implement a given task
- async orchestrators, which create an async loop and aggregates several lower level async functions
(for example using ``asyncio.gather`` or ``async for``)
- a synchronous wrapper function, that calls the async orchestrator and integrates it in the context
of the synchronous flow (for example with ``asyncio.run``)

It might be the case that a given module or package the latter two cases reside in
externally-calling code. For the asynchronous Pyxis GQL module in Freshmaker
(``pyxis_gql_async.py``), this is precisely the case: the module itself provides the PyxisAsyncGQL
class with several async functions; it is up to the other modules that will use it
to build the async loop and aggregate those functions in them as needed.

Concrete example for using PyxisAsyncGQL
----------------------------------------
In this next example, we will call ``PyxisAsyncGQL.get_repository_by_path`` several times, each for
a given path and registry, and aggregate each result conditionally.

.. code-block:: python
async def aggregate_paths():
path_registry_vals:list[tuple[str, str]] = [...]
results_to_skip = [...]
collected_results = []
for f in asyncio.as_completed(
[PyxisAsyncGQL.get_repository_by_path(*x) for x in path_registry_vals]
):
result = await f
if result not in results_to_skip:
collected_results.append(result)
def main():
asyncio.run(aggregate_paths())
References
==========
.. [RealPython] https://realpython.com/async-io-python/
.. [MiguelGrinberg] https://youtu.be/iG6fr81xHKA?t=4m29s
.. [AsyncIO] https://docs.python.org/3/library/asyncio-task.html#
.. [AsyncFor] https://stackoverflow.com/questions/56161595/how-to-use-async-for-in-python
Loading

0 comments on commit 89ee85d

Please sign in to comment.