You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A memory leak is detected by the Address Sanitizer when performing operations on an endpoint that has been shut down using fi_shutdown. The issue occurs specifically when using the TCP provider in RDM mode, and no event queue (ep->util_ep.eq) is bound to the domain.
To Reproduce
Steps to reproduce the behavior:
Use the TCP provider in RDM mode.
Create an endpoint without binding an event queue to the domain (ep->util_ep.eq remains empty).
Perform fi_shutdown on the endpoint.
Perform any additional operations on the endpoint after fi_shutdown.
Expected behavior
The memory allocated in xnet_ep_disable (specifically err_entry.err_data = mem_dup(err_data, err_data_size);) should be properly released, avoiding memory leaks.
Output
The Address Sanitizer reports the following memory leak:
2024-11-14T12:56:28.7675977Z ==73885==ERROR: LeakSanitizer: detected memory leaks
2024-11-14T12:56:28.7676337Z
2024-11-14T12:56:28.7676557Z Direct leak of 8 byte(s) in 1 object(s) allocated from:
2024-11-14T12:56:28.7677701Z #0 0x55c2874a72c3 in malloc (/test/test+0x6d52c3) (BuildId: 10d8cef421d2609343e1feb371ea248a68039137)
2024-11-14T12:56:28.7679278Z #1 0x7f2e491dea68 in mem_dup /test/third_party/libfabric/./include/ofi_mem.h:81:15
2024-11-14T12:56:28.7680923Z #2 0x7f2e491de493 in xnet_ep_disable /test/third_party/libfabric/prov/tcp/src/xnet_ep.c:458:25
2024-11-14T12:56:28.7682293Z #3 0x7f2e491d5819 in xnet_req_done /test/third_party/libfabric/prov/tcp/src/xnet_cm.c:209:2
2024-11-14T12:56:28.7683669Z #4 0x7f2e491f30d5 in xnet_run_ep /test/third_party/libfabric/prov/tcp/src/xnet_progress.c:1468:3
2024-11-14T12:56:28.7685215Z #5 0x7f2e491ee15a in xnet_handle_events /test/third_party/libfabric/prov/tcp/src/xnet_progress.c:1505:4
2024-11-14T12:56:28.7686681Z #6 0x7f2e491edf8a in xnet_run_progress /test/third_party/libfabric/prov/tcp/src/xnet_progress.c:1562:3
2024-11-14T12:56:28.7688089Z #7 0x7f2e491e96c6 in xnet_cq_progress /test/third_party/libfabric/prov/tcp/src/xnet_cq.c:84:2
2024-11-14T12:56:28.7689621Z #8 0x7f2e49129be0 in ofi_cq_readfrom /test/third_party/libfabric/prov/util/src/util_cq.c:270:2
2024-11-14T12:56:28.7690989Z #9 0x7f2e491e9d89 in xnet_cq_readfrom /test/third_party/libfabric/prov/tcp/src/xnet_cq.c:50:8
2024-11-14T12:56:28.7692541Z #10 0x55c287ad9c7f in fi_cq_readfrom(fid_cq*, void*, unsigned long, unsigned long*) /test/third_party/libfabric/include/rdma/fi_eq.h:402:9
Additional context
The memory leak originates from the function xnet_ep_disable at the line: err_entry.err_data = mem_dup(err_data, err_data_size);
The issue only occurs when no event queue is bound to the domain (ep->util_ep.eq is empty) and operations are performed on the endpoint after it has been shut down using fi_shutdown.
The text was updated successfully, but these errors were encountered:
@piotrchmiel The memory that leaks corresponds to a FI_SHUTDOWN event added to the Event Queue after the endpoint shuts down. I'm not familiar with RDM but I guess there is a bug where err_entry.err_data is not freed after the EQ event is consumed by RDM.
Describe the bug
A memory leak is detected by the Address Sanitizer when performing operations on an endpoint that has been shut down using fi_shutdown. The issue occurs specifically when using the TCP provider in RDM mode, and no event queue (ep->util_ep.eq) is bound to the domain.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The memory allocated in xnet_ep_disable (specifically err_entry.err_data = mem_dup(err_data, err_data_size);) should be properly released, avoiding memory leaks.
Output
The Address Sanitizer reports the following memory leak:
Environment:
OS: Ubuntu 22.04
Provider: TCP
Mode: RDM
Libfabric 1.22.0
Additional context
The memory leak originates from the function xnet_ep_disable at the line:
err_entry.err_data = mem_dup(err_data, err_data_size);
The issue only occurs when no event queue is bound to the domain (ep->util_ep.eq is empty) and operations are performed on the endpoint after it has been shut down using fi_shutdown.
The text was updated successfully, but these errors were encountered: