Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support 3.13t free-threaded python #471

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davidhewitt
Copy link
Member

First pass at adding CI and tests for freethreaded Python.

Locally I get some test failures, so I will have to investigate further before this is ready even if CI is green. cc @ngoldbaum

failures:

---- borrow::shared::tests::borrow_multiple_views stdout ----
thread 'borrow::shared::tests::borrow_multiple_views' panicked at src/borrow/shared.rs:836:17:
assertion `left == right` failed
  left: 3
 right: 1

---- borrow::shared::tests::borrow_multiple_arrays stdout ----
thread 'borrow::shared::tests::borrow_multiple_arrays' panicked at src/borrow/shared.rs:786:17:
assertion `left == right` failed
  left: 2
 right: 1


failures:
    borrow::shared::tests::borrow_multiple_arrays
    borrow::shared::tests::borrow_multiple_views

test result: FAILED. 27 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.13s

@davidhewitt davidhewitt marked this pull request as draft November 22, 2024 16:33
Comment on lines +181 to +182
// FIXME probably a deadlock risk here due to the GIL? Might need MutexExt trait in PyO3
let mut dtypes = self.dtypes.lock().expect("dtype cache poisoned");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choice to use Mutex here makes me want to add a MutexExt trait to PyO3 similar to OnceLockExt which we already added.

Given that writes to this should be infrequent (it's a global cache, as far as I can tell), I also wonder if RwLock is appropriate here. Readers are extremely short-lived so the analysis in https://blog.nelhage.com/post/rwlock-contention/ seems to be a non-issue (cc @alex)

... in which case I want RwLockExt too 😂

@davidhewitt
Copy link
Member Author

Based on the failures here, I propose rust-numpy 0.23.0 releases without free-threading support, and we seek to follow up in 0.23.1.

@davidhewitt davidhewitt added the CI-no-fail-fast If one job fails, allow the rest to keep testing label Nov 22, 2024
@Icxolu
Copy link
Contributor

Icxolu commented Nov 24, 2024

I did a bit of investigating here. I believe the fundamental problem is that the borrow checking API relies on the GIL for synchronization of exclusive access

unsafe extern "C" fn acquire_shared(flags: *mut c_void, array: *mut PyArrayObject) -> c_int {
// SAFETY: GIL must be held when calling `acquire_shared`.
let py = Python::assume_gil_acquired();
let flags = &mut *(flags as *mut BorrowFlags);

On the free-threaded build this is not true anymore, and therefore unsound.

An easy (but probably not optimal) solution is to wrap BorrowFlagsInner into a Mutex (I testet with parking_lot to not have to deal with poisoning) and change everything to shared access. I believe that should be enough to make it sound. The tests still don't pass, but I think that is related to their implicit assumption of running serially. The GIL enforced each tests running to completion before the next one can start, but on the free-threaded build the can run interleaved, and some assertions do not hold because of that.

@davidhewitt davidhewitt mentioned this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-no-fail-fast If one job fails, allow the rest to keep testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants