xpmem: building with xpmem causes regression - Issue#10403 #16

syskovprdap · 2024-10-07T08:11:25Z

*Describe the bug*
We built libfabric with `--enable-xpmem` but did not load the xpmem kernel module. We observed a performance regression on small message sizes compared to not building with xpmem when running Intel MPI Benchmark Alltoall with Open MPI5.

*To Reproduce*
```
/openmpi5/bin/mpirun --wdir . -n 1024 --hostfile hostfile --map-by ppr:64:node --timeout 1800 -x OMPI_MCA_accelerator=null -x FI_EFA_USE_DEVICE_RDMA=1 -x LD_LIBRARY_PATH=/build/libraries/libfabric/main/install/libfabric/lib -x PATH /build/workloads/imb/openmpi-v5.0.3-installer/source/mpi-benchmarks-IMB-v2021.7/IMB-MPI1 Alltoall -npmin 1024 -iter 200 -time 20 -mem 1 2>&1 | tee node16-ppn64.txt
```

*Expected behavior*
The performance shouldn''t be impacted when not loading the xpmem kernel module, whether building libfabric with `--enable-xpmem` or not

*Output*
building libfabric without `--enable-xpmem`
```

Calling sequence was:

/build/workloads/imb/openmpi-v5.0.3-installer/source/mpi-benchmarks-IMB-v2021.7/IMB-MPI1 Alltoall -npmin 1024 -iter 200 -time 20 -mem 1

Minimum message length in bytes: 0
Maximum message length in bytes: 4194304
#
MPI_Datatype : MPI_BYTE
MPI_Datatype for reductions : MPI_FLOAT
MPI_Op : MPI_SUM
#
#

List of Benchmarks to run:

Alltoall
#-----------------------------------------------------------------------------
Benchmarking Alltoall
#processes = 1024
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 200 0.05 0.08 0.05 0.00
1 200 106.05 146.90 134.23 0.00
2 200 111.80 181.11 154.34 0.00
4 200 124.38 185.32 160.79 0.00
8 200 129.01 182.50 162.60 0.00
16 200 232.57 394.08 304.96 0.00
32 200 558.01 985.00 762.65 0.00
64 200 7093.18 12193.73 9953.92 0.00
128 200 8769.56 30265.41 24831.23 0.00
256 200 2822.34 37262.43 27504.94 0.00
512 21 12156.41 13949.84 12884.10 0.00
1024 21 14784.92 15321.06 15067.40 0.00
2048 21 17682.00 18692.46 18319.53 0.00
4096 21 14576.67 15542.40 15145.84 0.00
8192 21 21843.53 23535.42 23007.77 0.00
16384 3 44885.20 46713.80 45756.77 0.00
```
building libfabric with `--enable-xpmem`
```
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 200 0.05 0.07 0.05 0.00
1 200 115.06 148.07 137.58 0.00
2 200 139.29 176.52 161.56 0.00
4 200 181.14 215.82 205.05 0.00
8 200 160.08 198.28 183.84 0.00
16 200 308.01 431.14 363.63 0.00
32 200 845.39 1191.64 996.77 0.00
64 200 8688.64 18961.72 14613.72 0.00
128 200 14885.64 29020.61 23739.58 0.00
256 200 6650.37 38164.19 27603.49 0.00
512 22 11599.52 12864.61 12165.26 0.00
1024 22 14662.48 15350.84 15017.10 0.00
2048 22 17599.68 18588.80 18188.19 0.00
4096 22 14443.24 15390.64 14997.93 0.00
8192 22 21959.33 23530.77 23043.91 0.00
16384 3 44656.48 46520.43 45761.54 0.00
```
The latency increases for message size <= 64 bytes

*Environment:*
Amazon Linux2, 16 hpc7g.16xlarge

*Additional context*
Add any other context about the problem here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xpmem: building with xpmem causes regression - Issue#10403 #16

xpmem: building with xpmem causes regression - Issue#10403 #16

syskovprdap commented Oct 7, 2024

xpmem: building with xpmem causes regression - Issue#10403 #16

xpmem: building with xpmem causes regression - Issue#10403 #16

Comments

syskovprdap commented Oct 7, 2024