Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE: Using "Airflow Exceptions" in Task SDK without importing core #44353

Open
kaxil opened this issue Nov 25, 2024 · 1 comment
Open

SPIKE: Using "Airflow Exceptions" in Task SDK without importing core #44353

kaxil opened this issue Nov 25, 2024 · 1 comment
Labels
area:core area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK

Comments

@kaxil
Copy link
Member

kaxil commented Nov 25, 2024

Similar to #44352, we need to figure a way to use Airflow Exception in the Task SDK without importing Core.

@kaxil kaxil added the area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK label Nov 25, 2024
@dosubot dosubot bot added the area:core label Nov 25, 2024
@potiuk
Copy link
Member

potiuk commented Nov 26, 2024

Comment and proposal:

I think we won't avoid some "airflow.common" eventually for thos and #44352 reason. It falls into packaging decisions that we deferred, but I think the cleanest way is to have some common package.

We have all the experiences from the provider's there and we should learn from it.

This package could be potentially embedded into TaskSDK and Core (with changed package name for example), but then there is a whole set of problems what happens if a different version of same package is embedded in different packages installed at the same time (at least for Local Executor and standalone case we will have to have both Task SDK and "core" installed together). And at the same time we want to update Airlfow code independently from Workers core - which makes it tricky as we should be able to handle the case where workers "common" code is older than "core" common code.

IMHO - cleanest and simplest solution for this package is to have a separate package that comes with Airlfow and it will have to have a very strict compatibility policy - something we already went through with common.sql for example:

a) it should never have a breaking change
b) it always be back-compatible (which is another way of saying a)
c) we should have some tests that should make sure that breaking change did not happen accidentally.
d) TaskSDK package should have >= SOME_3_X_0 requirement for the common code - and should work with any "future" version of the common code

The (somewhat successful but likely could be improved) attempt of doing it was in https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/common/sql/hooks/sql.pyi which is automatcally genereed by update-common-sql-api-stubs pre-commit. Which then delegated the check if interface changed to mypy. But IMHO it is far too little (and unnecessary if we attempt to actually test the common code for compatibility).

What really works (and we have it in providers) is running current tests against older versions of packages we want to tests - - this has much more "compatibility" properties than checking if the Python API changed (because it also tests the behaviour).

So we could potentially used similar techniques we use for providers now, where we determine the "lowest direct" set of dependencies and run "compatibility" tests of new providers with older airflow versions - only here it would have to be more of a "forward" testing - we should get tests for the OLDER released Task SDK tests and run them against NEWER common code. This is all doable I think - Task SDK and set of tests it runs should be relatvely small, and we should be able to add CI job to checkout 3.0, 3.1, 3.2, .... Task SDK tests and run them against latest "common" code to see if there are any compatibility issues.

I think it would be good to agree on some rules here - how long we can expect the old "common" to work with new "core" - and allow to remove some old, deprecated code. Also that will allow us to improve and evolve tests code, because it's a bit trickly to run test code checkout from one branch/version and run it with code from another version (this is the set of test compatibities we already have in provider's compatibility tests - and we will only be able to maintain it for a long time because we have the rule of 6 months support for old Airflow versions, it would be next to impossible to take main tests and run them on 2.0 for example - we are going to have very similar situation with running "old task sdk tests" with "new common code" and we should limit the number of versions we should run it with.

That would be my proposal how to solve it - and providers could be use as a showcase on kinds of issues we will need to handle and show that it can work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK
Projects
Development

No branches or pull requests

2 participants