You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A race condition occurs in the ofi_get_core_info function in the libfabric library (v1.22.0). Specifically, the global variable log_prefix is modified at line 317 without thread safety, leading to issues when fi_getinfo is called simultaneously from multiple threads.
To Reproduce
Steps to reproduce the behavior:
Use libfabric v1.22.0 with the RXD utility provider in conjunction with verbs.
Have multiple threads where for example one calls fi_getinfo and another calls fi_fabric.
Run the application.
Observe race condition warnings or crashes flagged by thread sanitizer.
Expected behavior
Per the documentation, fi_getinfo should be thread-safe and callable simultaneously by multiple threads without serialization. Modifications to the global variable log_prefix should not cause a race condition
Output
Thread sanitizer identifies a race condition when modifying the global variable log_prefix during simultaneous calls to fi_getinfo and other operations.
Code Reference
The problematic function is:
int ofi_get_core_info(uint32_t version, const char *node, const char *service,
uint64_t flags, const struct util_prov *util_prov,
const struct fi_info *util_hints,
const struct fi_info *base_attr,
ofi_map_info_t info_to_core, struct fi_info **core_info)
{
struct fi_info *core_hints = NULL;
int ret;
ret = ofi_info_to_core(version, util_prov->prov, util_hints, base_attr,
info_to_core, &core_hints);
if (ret)
return ret;
log_prefix = util_prov->prov->name; // <-- Global variable modified here
ret = fi_getinfo(version, node, service, flags | OFI_CORE_PROV_ONLY,
core_hints, core_info);
log_prefix = ""; // <-- Global variable reset here
fi_freeinfo(core_hints);
return ret;
}
Environment:
OS: ubuntu22.04
Provider: RXD utility provider with verbs.
Libfabric version: 1.22.0
Additional Context
The log_prefix variable is shared across threads, which leads to undefined behavior when multiple threads modify it simultaneously. This violates the thread-safety guarantees of fi_getinfo. A possible fix could involve using thread-local storage for log_prefix to avoid contention between threads.
The text was updated successfully, but these errors were encountered:
piotrchmiel
changed the title
Race Condition in ofi_get_core_info Due to Global log_prefix Variable in libfabric v1.22.0
Race Condition in ofi_get_core_info due to hlobal log_prefix variable in libfabric v1.22.0
Nov 20, 2024
piotrchmiel
changed the title
Race Condition in ofi_get_core_info due to hlobal log_prefix variable in libfabric v1.22.0
Race Condition in ofi_get_core_info due to global log_prefix variable in libfabric v1.22.0
Nov 20, 2024
piotrchmiel
changed the title
Race Condition in ofi_get_core_info due to global log_prefix variable in libfabric v1.22.0
Race condition in ofi_get_core_info due to global log_prefix variable in libfabric v1.22.0
Nov 20, 2024
Describe the bug
A race condition occurs in the ofi_get_core_info function in the libfabric library (v1.22.0). Specifically, the global variable log_prefix is modified at line 317 without thread safety, leading to issues when fi_getinfo is called simultaneously from multiple threads.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Per the documentation, fi_getinfo should be thread-safe and callable simultaneously by multiple threads without serialization. Modifications to the global variable log_prefix should not cause a race condition
Output
Thread sanitizer identifies a race condition when modifying the global variable log_prefix during simultaneous calls to fi_getinfo and other operations.
Code Reference
The problematic function is:
Environment:
Additional Context
The log_prefix variable is shared across threads, which leads to undefined behavior when multiple threads modify it simultaneously. This violates the thread-safety guarantees of fi_getinfo. A possible fix could involve using thread-local storage for log_prefix to avoid contention between threads.
The text was updated successfully, but these errors were encountered: