Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sysbench memory and threads workload is giving less benchmarking results with iwasm than native aarch64 gcc execution #3752

Open
subhakr opened this issue Aug 23, 2024 · 12 comments

Comments

@subhakr
Copy link

subhakr commented Aug 23, 2024

Subject of the issue

I have ran both modes like in native aarch64 with gcc and with WAMR runtime.
In aarch64 native with gcc its giving better result like below.

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# sysbench memory --memory-block-size=1K --memory-total-size=3G --time=3 run
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1KiB
  total size: 3072MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 2599323 (866387.19 per second)

2538.40 MiB transferred (846.08 MiB/sec)


Throughput:
    events/s (eps):                      866387.1913
    time elapsed:                        3.0002s
    total number of events:              2599323

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.12
         95th percentile:                        0.00
         sum:                                 1308.86

Threads fairness:
    events (avg/stddev):           2599323.0000/0.00
    execution time (avg/stddev):   1.3089/0.00

but for aarch64 with WAMR i getting less benchmarking results as below

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# ./iwasm  sysbench.aot memory --memory-block-size=1K --memory-total-size=3G --time=3 run
Attempting to allocate 1064960 bytes of memory...
sysbench 1.1.0-2ca9e3f (using Lua Lua 5.3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1KiB
  total size: 3072MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 532424 (177466.05 per second)

519.95 MiB transferred (173.31 MiB/sec)


Throughput:
    events/s (eps):                      177466.0549
    time elapsed:                        3.0001s
    total number of events:              532424

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.16
         95th percentile:                        0.00
         sum:                                 1366.00

Threads fairness:
    events (avg/stddev):           532424.0000/0.00
    execution time (avg/stddev):   1.3660/0.00

Test case

sysbench.zip
in above zip file i have sysbench.aot which is compatible to aarch64.

Your environment

*Host OS: Ubuntu 22.04 LTS
*WAMR version: 2.1.1
*CPU architecture: aarch64
*RAM:3GB
*Internal space:128GB

Steps to reproduce

for iwasm
./iwasm sysbench.aot memory --memory-block-size=1K --memory-total-size=3G --time=3 run

for native aarch64, we have to install sysbench then have to run below command.
sysbench memory --memory-block-size=1K --memory-total-size=3G --time=3 run

Expected behavior

I ran sysbench.aot for cpu workload with iwasm i got better results than native sysbench aarch64 with gcc execution.

Actual behavior

But i am getting less benchmark results than sysbnech aarch64 with gcc execution

Extra Info

Here i am expecting better results with iwam sysbench.aot commnad, but i am getting less results that why i would like know the solution to get better result in iwasm.
the same thing i am getting in threads workload also.

please verify for threads workload also.

these are the commands
for iwasm with wasi-sdk23

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# sysbench threads --threads=8 --time=3 run
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time


Initializing worker threads...

Threads started!


Throughput:
    events/s (eps):                      454.7764
    time elapsed:                        3.0125s
    total number of events:              1370

Latency (ms):
         min:                                    2.80
         avg:                                   17.54
         max:                                  188.05
         95th percentile:                       70.55
         sum:                                24025.06

Threads fairness:
    events (avg/stddev):           171.2500/13.71
    execution time (avg/stddev):   3.0031/0.00

for aarch64 with gcc

root@s32r45evb:~/Sysbench_S32R45_WAMR_GCC/sysbench_wasm# sysbench threads --threads=8 --time=3 run
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time


Initializing worker threads...

Threads started!


Throughput:
    events/s (eps):                      454.7764
    time elapsed:                        3.0125s
    total number of events:              1370

Latency (ms):
         min:                                    2.80
         avg:                                   17.54
         max:                                  188.05
         95th percentile:                       70.55
         sum:                                24025.06

Threads fairness:
    events (avg/stddev):           171.2500/13.71
    execution time (avg/stddev):   3.0031/0.00
@TianlongLiang
Copy link
Contributor

I think the threads benchmark workload results you posted are duplicated

Can you also tell me the command you used to compile aot files? For example, from that command, I can know whether you are using software boundary checks(it will result in performance loss when there are massive IO)

You can also refer to this document to see whether there is any helpful information you could use to analyze the performance gap further

@subhakr
Copy link
Author

subhakr commented Aug 26, 2024

CFLAGS="-O3 -funroll-loops --sysroot=/home/admin1/Downloads/wasi-sdk-23.0-x86_64-linux/share/wasi-sysroot -pthread -fexceptions -D_WASI_EMULATED_PROCESS_CLOCKS -matomics -mbulk-memory"

LDFLAGS="--sysroot=/home/admin1/Downloads/wasi-sdk-23.0-x86_64-linux/share/wasi-sysroot -pthread -fexceptions -Wl,--shared-memory -g -lwasi-emulated-mman -Wl,--export-all -Wl,--no-entry -Wl,--export=__heap_base -Wl,--export=__data_end -pthread -lwasi-emulated-process-clocks -Wl,--initial-memory=2147483648 -Wl,--max-memory=2147483648"

make CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS"
then it will generate sysbench wasm module.
after that i have converted that sysbench wasm module into .aot by using below command with help of wamrc.
/home/admin1/Public/wasm-micro-runtime/wamr-compiler/build/wamrc --enable-multi-thread -o sysbench.aot src/sysbench

then i am running sysbench workloads.
and the result of sysbench thread workload with WASM compilation.

Attempting to allocate 1064960 bytes of memory...
sysbench 1.1.0-2ca9e3f (using Lua Lua 5.3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time


Initializing worker threads...

Threads started!


Throughput:
    events/s (eps):                      315.7304
    time elapsed:                        3.0342s
    total number of events:              958

Latency (ms):
         min:                                    2.86
         avg:                                   25.18
         max:                                  407.28
         95th percentile:                      176.73
         sum:                                24124.87

Threads fairness:
    events (avg/stddev):           119.7500/20.50
    execution time (avg/stddev):   3.0156/0.01

@subhakr
Copy link
Author

subhakr commented Aug 26, 2024

and one more thing for cpu workload also
i am getting drastic change results in sysbench wasm module like below

admin1@admin1-VivoBook-ASUSLaptop-X515EA-P1511CEA:~/sysbench_main$ /home/admin1/Documents/wasm-micro-runtime/product-mini/platforms/linux/build/iwasm sysbench.aot cpu --cpu-max-prime=20000 --time=3 run
Attempting to allocate 1064960 bytes of memory...
sysbench 1.1.0-2ca9e3f (using Lua Lua 5.3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second: 773738.73

Throughput:
    events/s (eps):                      773738.7288
    time elapsed:                        3.0002s
    total number of events:              2321354

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.05
         95th percentile:                        0.00
         sum:                                  688.21

Threads fairness:
    events (avg/stddev):           2321354.0000/0.00
    execution time (avg/stddev):   0.6882/0.00

In gcc compilation of sysbench i am getting below results.

admin1@admin1-VivoBook-ASUSLaptop-X515EA-P1511CEA:~/sysbench_main$ sysbench cpu --cpu-max-prime=20000 --time=3 run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1340.51

General statistics:
    total time:                          3.0005s
    total number of events:              4024

Latency (ms):
         min:                                    0.73
         avg:                                    0.75
         max:                                    1.61
         95th percentile:                        0.78
         sum:                                 2999.55

Threads fairness:
    events (avg/stddev):           4024.0000/0.00
    execution time (avg/stddev):   2.9996/0.00

why i am getting the result with this much difference in wasm module

@TianlongLiang
Copy link
Contributor

Can you share more details on how to compile sysbench to wasm? I try your command in root directory of sysbench and luajit report error for not supporting wasm architecture:

# command
make CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS" CC=/opt/wasi-sdk/bin/clang

error:

Making all in third_party/luajit
make[1]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make -C ./luajit clean
make[2]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make -C src clean
make[3]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
Makefile:271: *** Unsupported target architecture.  Stop.
make[3]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
make[2]: *** [Makefile:166: clean] Error 2
make[2]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make[1]: *** [Makefile:501: lib/libluajit-5.1.a] Error 2
make[1]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make: *** [Makefile:478: all-recursive] Error 1

tl in 🌐 TL-Work-PC in sysbench on  master [?] 
❯ make CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS" CC=/opt/wasi-sdk/bin/clang
Making all in third_party/luajit
make[1]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make -C ./luajit clean
make[2]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make -C src clean
make[3]: Entering directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
lj_arch.h:69:2: error: "No support for this architecture (yet)"
   69 | #error "No support for this architecture (yet)"
      |  ^
lj_arch.h:439:2: error: "No target architecture defined"
  439 | #error "No target architecture defined"
      |  ^
2 errors generated.
Makefile:271: *** Unsupported target architecture.  Stop.
make[3]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit/src'
make[2]: *** [Makefile:166: clean] Error 2
make[2]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit/luajit'
make[1]: *** [Makefile:501: lib/libluajit-5.1.a] Error 2
make[1]: Leaving directory '/home/tl/TL/clion_projects/sysbench/third_party/luajit'
make: *** [Makefile:478: all-recursive] Error 1

@subhakr
Copy link
Author

subhakr commented Aug 27, 2024

yeah, for wasm luajit is not supporting thats why i have used lua5.3 version.
and that too we have to convert those lua and concurrecy kit dir files into either .bc or .wasm formate.
/home/admin1/Downloads/wasi-sdk-23.0-x86_64-linux/share/wasi-sysroot/lib
in above dir i have placed thos two libraries like libck.a and liblua.a

i will share my sysbench src for your reference.
sysbench_main.zip

if you want to try with my sysbench src please change the makefile paths a/c.
and use this libarries for liblua and libck .
i am using wasi-sdk23 version.
please place this files in mentioned path as above.
libraries.zip

NOTE: In libraries dir i have liblua.a_bk file which include setjmp and longjmp functions.
In liblua.a i have commented becoz its not supporting while running.
If its possible to resolve can you try it to resolve issue.

@subhakr
Copy link
Author

subhakr commented Aug 28, 2024

Hi, any update from your side..

Thanks in advance.

@subhakr
Copy link
Author

subhakr commented Aug 29, 2024

Could you please tell me if any informantion is there for above issues.

@TianlongLiang
Copy link
Contributor

Sorry that I caught busy with a few other things last week, I will investigate it now

@subhakr
Copy link
Author

subhakr commented Sep 2, 2024

ok thank you.

@subhakr
Copy link
Author

subhakr commented Sep 3, 2024

Hi, @TianlongLiang
Is there any info about above issues.

thanks in Advance.

@TianlongLiang
Copy link
Contributor

I still can't compile your sysbench-main to wasm, I just use the commands that I would compile normal sysbench:

./autogen.sh
# Add --with-pgsql to build with PostgreSQL support
./configure
make -j
CFLAGS="$CFLAGS" LDFLAGS="$LDFLAGS" CC=/opt/wasi-sdk/bin/clang

And it still emit bunch of errors:

In file included from /home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/ck_spinlock.h:33:
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:65:2: error: call to undeclared function 'ck_pr_fence_lock'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   65 |         ck_pr_fence_lock();
      |         ^
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:75:2: error: call to undeclared function 'ck_pr_fence_acquire'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   75 |         ck_pr_fence_acquire();
      |         ^
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:99:2: error: call to undeclared function 'ck_pr_fence_lock'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   99 |         ck_pr_fence_lock();
      |         ^
/home/tl/TL/clion_projects/sysbench-wamr/sysbench_main/third_party/concurrency_kit/include/spinlock/dec.h:118:2: error: call to undeclared function 'ck_pr_fence_lock'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
  118 |         ck_pr_fence_lock();
      |         ^

I don't know whether there are some more configurations I need to modify other than wasi-sdk and user directory. I also don't know why it still compiles the third_party libraries.

Could you please provide more details on how to compile it using the existing library you sent me and the commands for compiling the wasm version sysbench? It seems that the simple make won't do it.

@subhakr
Copy link
Author

subhakr commented Sep 23, 2024

Sorry for late reply,
I will look into it once then i will tell you clearly, how i have integrated.
thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants