Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust bindings compatibility #233

Closed
matthewkeil opened this issue Sep 30, 2024 · 5 comments
Closed

Rust bindings compatibility #233

matthewkeil opened this issue Sep 30, 2024 · 5 comments

Comments

@matthewkeil
Copy link

Hi there @dot-asm. Thanks again for the MSM PR a couple months back. We have consumers using that library now and have run across two cases where there is an architecture compatibility issue. It's a bit weird though and would love your insight as to why its coming up and how to mitigate long-term.

Both consumers are on the same Intel Celeron processor:

  • Intel® Celeron® Processor N5105

They were getting an "illegal instruction" error and the core dumped.
ChainSafe/lodestar#7074

When looking for the root cause we analyzed the dump and saw it was coming from blst but were not sure with Release build because the function names were not there so we built a version with Debug symbols to try and target which function was throwing. While working through the nuances of getting that sorted for them the Debug compile threw a warning about std feature not being specified. So i added the feature (even-though it should have been applied in the build.rs.
https://github.com/ChainSafe/lodestar/blob/05012b36e16c74a1b6ea4332beea9bedfe93d2e9/temp-deps/blst-ts/blst/bindings/rust/Cargo.toml#L30

Once that was added (and using the debug build) the illegal instruction disappeared. I got some info to try and figure out why that made a difference but after seeing the build env I think that std should have been used so not sure why hard applying it mattered.

~/lodestar-debug unstable user@irena
❯ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)

~/lodestar-debug unstable user@irena
❯ rustc --print cfg
debug_assertions
panic="unwind"
target_abi=""
target_arch="x86_64"
target_endian="little"
target_env="gnu"
target_family="unix"
target_feature="fxsr"
target_feature="sse"
target_feature="sse2"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="ptr"
target_os="linux"
target_pointer_width="64"
target_vendor="unknown"
unix

This brings us to how you can help. We are not sure why this behavior is happening so cannot recommend a solution/PR to resolve it. We will be happy to help in anyway we can though to try and get this sorted.

Thanks!

  • Matt
@dot-asm
Copy link
Collaborator

dot-asm commented Oct 2, 2024

They were getting an "illegal instruction" error and the core dumped.

It's a README issue,

If the target application crashes with an "illegal instruction" exception [after copying to an older system], activate `portable` feature when building blst. Conversely, if you compile on an older Intel system, but will execute the binary on a newer one, consider instead activating <nobr>`force-adx`</nobr> feature. Though keep in mind that [cc](https://crates.io/crates/cc) passes the value of `CFLAGS` environment variable to the C compiler, and if set to contain specific flags, it can interfere with feature selection. <nobr>`-D__BLST_PORTABLE__`</nobr> and <nobr>`-D__ADX__`</nobr> are the said features' equivalents.

@matthewkeil
Copy link
Author

Noted.

Out of curiosity, why does the cpu detection work fine when building the bindings with C tooling but not with rust tooling?

I forgot to add one detail. In the case where the code worked correctly I built the C code first and then ran the rust compiler with the libblst.a available. In this situation there was no "illegal instruction" and the code runs fine. I would think that the rust build.rs could possibly handle this but am not sure. What is your expert opinion?

@dot-asm
Copy link
Collaborator

dot-asm commented Oct 3, 2024

why does the cpu detection work fine when building the bindings with C tooling but not with rust tooling?

Both procedures are designed and verified to produce the same results (on Linux and MacOS). If they are not, then the question is rather what goes wrong on your side. Can you confirm that if you execute cargo test in "pristine" git-clone-ed directory on the Celeron system, it crashes with "illegal instruction"? However, you shouldn't take this remark as an invitation to hand-walk you through it. The suggestion is rather you do more rigorous testing and discover the failing pattern that contradicts the quoted README paragraph.

@q9f
Copy link

q9f commented Oct 14, 2024

I'm the user who reported the issue downstream.

I can run cargo test on the affected machine without getting an illegal instruction.

~/blst/bindings/rust master user@irena
❯ uname -a
Linux irena 6.10.9-arch1-2 #1 SMP PREEMPT_DYNAMIC Tue, 10 Sep 2024 14:37:32 +0000 x86_64 GNU/Linux

~/blst/bindings/rust master user@irena
❯ grep -iI "name" /proc/cpuinfo | uniq
model name	: Intel(R) Celeron(R) N5105 @ 2.00GHz

~/blst/bindings/rust master user@irena
❯ rustup --version                 
rustup 1.27.1 (54dd3d00f 2024-04-24)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.81.0 (eeb90cda1 2024-09-04)`

~/blst/bindings/rust master user@irena
❯ cargo test                 
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.09s
     Running unittests src/lib.rs (target/debug/deps/blst-194a358e4e2578d3)

running 29 tests
test bindgen_test_layout_blst_fp2 ... ok
test bindgen_test_layout_blst_fr ... ok
test bindgen_test_layout_blst_p1 ... ok
test bindgen_test_layout_blst_p1_affine ... ok
test bindgen_test_layout_blst_p2 ... ok
test bindgen_test_layout_blst_p2_affine ... ok
test bindgen_test_layout_blst_pairing ... ok
test bindgen_test_layout_blst_fp6 ... ok
test bindgen_test_layout_blst_fp ... ok
test bindgen_test_layout_blst_fp12 ... ok
test bindgen_test_layout_blst_uniq ... ok
test bindgen_test_layout_blst_scalar ... ok
test bindgen_test_normal_types ... ok
test min_pk::tests::test_multi_point ... ok
test min_pk::tests::test_serialization ... ok
test min_pk::tests::test_aggregate ... ok
test min_pk::tests::test_sign_n_verify ... ok
test fp12_test::miller_loop_n ... ok
test min_sig::tests::test_multi_point ... ok
test min_sig::tests::test_serialization ... ok
test min_sig::tests::test_sign_n_verify ... ok
test min_sig::tests::test_aggregate ... ok
test p1_multi_point::test_add ... ok
test p2_multi_point::test_add ... ok
test min_pk::tests::test_multiple_agg_sigs ... ok
test sk_test::inverse ... ok
test min_sig::tests::test_multiple_agg_sigs ... ok
test p1_multi_point::test_mult ... ok
test p2_multi_point::test_mult ... ok

test result: ok. 29 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.26s

   Doc-tests blst

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


~/blst/bindings/rust master user@irena
❯ git log -n1
commit 6f3136ffb636974166a93f2f25436854fe8d10ff (HEAD -> master, origin/master, origin/HEAD)
Author: Andy Polyakov <[email protected]>
Date:   Wed Sep 18 21:49:26 2024 +0200

    .github/workflows/ci.yml: fix the failure.

@matthewkeil can you make sure to activate the portable feature when building blst?

@matthewkeil
Copy link
Author

matthewkeil commented Oct 17, 2024

Closing issue, and leaving this breadcrumb in case others run across a similar issue. Resolved with portable but there is slight performance degradation when aggregating signatures. Will weigh whether its worth creating two releases, or accept the slight performance hit to support Celeron. As a note most functions perform identically and some performance metrics actually run faster but its within a very small variance aside from the signature aggregation mentioned above.

Some results can be found here:
ChainSafe/blst-ts#154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants