Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpmem: building with xpmem causes regression - Issue#10403 #16

Open
syskovprdap opened this issue Oct 7, 2024 · 0 comments
Open

xpmem: building with xpmem causes regression - Issue#10403 #16

syskovprdap opened this issue Oct 7, 2024 · 0 comments

Comments

@syskovprdap
Copy link
Contributor

*Describe the bug*
We built libfabric with `--enable-xpmem` but did not load the xpmem kernel module. We observed a performance regression on small message sizes compared to not building with xpmem when running Intel MPI Benchmark Alltoall with Open MPI5.

*To Reproduce*
```
/openmpi5/bin/mpirun --wdir . -n 1024 --hostfile hostfile --map-by ppr:64:node --timeout 1800 -x OMPI_MCA_accelerator=null -x FI_EFA_USE_DEVICE_RDMA=1 -x LD_LIBRARY_PATH=/build/libraries/libfabric/main/install/libfabric/lib -x PATH /build/workloads/imb/openmpi-v5.0.3-installer/source/mpi-benchmarks-IMB-v2021.7/IMB-MPI1 Alltoall -npmin 1024 -iter 200 -time 20 -mem 1 2>&1 | tee node16-ppn64.txt
```

*Expected behavior*
The performance shouldn''t be impacted when not loading the xpmem kernel module, whether building libfabric with `--enable-xpmem` or not

*Output*
building libfabric without `--enable-xpmem`
```

  1. Calling sequence was:
  1. /build/workloads/imb/openmpi-v5.0.3-installer/source/mpi-benchmarks-IMB-v2021.7/IMB-MPI1 Alltoall -npmin 1024 -iter 200 -time 20 -mem 1
  1. Minimum message length in bytes: 0
  2. Maximum message length in bytes: 4194304
    #
  3. MPI_Datatype : MPI_BYTE
  4. MPI_Datatype for reductions : MPI_FLOAT
  5. MPI_Op : MPI_SUM
    #
    #
  1. List of Benchmarks to run:
  1. Alltoall
    #-----------------------------------------------------------------------------
  2. Benchmarking Alltoall
  3. #processes = 1024
    #-----------------------------------------------------------------------------
    #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
    0 200 0.05 0.08 0.05 0.00
    1 200 106.05 146.90 134.23 0.00
    2 200 111.80 181.11 154.34 0.00
    4 200 124.38 185.32 160.79 0.00
    8 200 129.01 182.50 162.60 0.00
    16 200 232.57 394.08 304.96 0.00
    32 200 558.01 985.00 762.65 0.00
    64 200 7093.18 12193.73 9953.92 0.00
    128 200 8769.56 30265.41 24831.23 0.00
    256 200 2822.34 37262.43 27504.94 0.00
    512 21 12156.41 13949.84 12884.10 0.00
    1024 21 14784.92 15321.06 15067.40 0.00
    2048 21 17682.00 18692.46 18319.53 0.00
    4096 21 14576.67 15542.40 15145.84 0.00
    8192 21 21843.53 23535.42 23007.77 0.00
    16384 3 44885.20 46713.80 45756.77 0.00
    ```
    building libfabric with `--enable-xpmem`
    ```
    #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
    0 200 0.05 0.07 0.05 0.00
    1 200 115.06 148.07 137.58 0.00
    2 200 139.29 176.52 161.56 0.00
    4 200 181.14 215.82 205.05 0.00
    8 200 160.08 198.28 183.84 0.00
    16 200 308.01 431.14 363.63 0.00
    32 200 845.39 1191.64 996.77 0.00
    64 200 8688.64 18961.72 14613.72 0.00
    128 200 14885.64 29020.61 23739.58 0.00
    256 200 6650.37 38164.19 27603.49 0.00
    512 22 11599.52 12864.61 12165.26 0.00
    1024 22 14662.48 15350.84 15017.10 0.00
    2048 22 17599.68 18588.80 18188.19 0.00
    4096 22 14443.24 15390.64 14997.93 0.00
    8192 22 21959.33 23530.77 23043.91 0.00
    16384 3 44656.48 46520.43 45761.54 0.00
    ```
    The latency increases for message size <= 64 bytes

*Environment:*
Amazon Linux2, 16 hpc7g.16xlarge

*Additional context*
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant