Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating histogram metric causes performance issue when spilling as it is updated too frequently #11603

Open
HolyLow opened this issue Nov 20, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@HolyLow
Copy link
Contributor

HolyLow commented Nov 20, 2024

Description

When testing Velox spilling functionality, we found that the velox spilling would have performance issue.
Although the spilling has a thread pool up to 8 threads, the actual CPU concurrency is only 1~2.
image

Then we use tools like pstack to find what's wrong, and it turns out that all the spilling threads are bottlenecked by prometheusCpp's updateHistogramMetric calling, and the stack is as follows:

Thread 2242 (Thread 0x7f8aee23d640 (LWP 10173)):
#0  0x00007fa72a2a6dc0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fa72a29f753 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007fa60f5c23e6 in std::__1::mutex::lock() () from /data/shared_libs/libnative_execution.so
#3  0x00007fa60fa31358 in prometheus::Summary::Observe(double) () from /data/shared_libs/libnative_execution.so
#4  0x00007fa60f9fc768 in facebook::presto::prometheus::PrometheusStatsReporter::addHistogramMetricValue(char const*, unsigned long) const () from /data/shared_libs/libnative_execution.so
#5  0x00007fa60f9fc91f in facebook::presto::prometheus::PrometheusStatsReporter::addHistogramMetricValue(folly::Range<char const*>, unsigned long) const () from /data/shared_libs/libnative_execution.so
#6  0x00007fa614234bca in facebook::velox::common::updateGlobalSpillAppendStats(unsigned long, unsigned long) () from /data/shared_libs/libnative_execution.so
#7  0x00007fa613c32177 in facebook::velox::exec::SpillWriter::updateAppendStats(unsigned long, unsigned long) () from /data/shared_libs/libnative_execution.so
#8  0x00007fa613c32083 in facebook::velox::exec::SpillWriter::write(std::__1::shared_ptr<facebook::velox::RowVector> const&, folly::Range<facebook::velox::IndexRange*> const&) () from /data/shared_libs/libnative_execution.so
#9  0x00007fa613c2c119 in facebook::velox::exec::SpillState::appendToPartition(unsigned int, std::__1::shared_ptr<facebook::velox::RowVector> const&) () from /data/shared_libs/libnative_execution.so
#10 0x00007fa613c3662e in facebook::velox::exec::Spiller::writeSpill(int) () from /data/shared_libs/libnative_execution.so
#11 0x00007fa613c3a494 in facebook::velox::AsyncSource<facebook::velox::exec::Spiller::SpillStatus>::runMake(std::__1::function<std::__1::unique_ptr<facebook::velox::exec::Spiller::SpillStatus, std::__1::default_delete<facebook::velox::exec::Spiller::SpillStatus> > ()>&) () from /data/shared_libs/libnative_execution.so
#12 0x00007fa613c39d66 in facebook::velox::AsyncSource<facebook::velox::exec::Spiller::SpillStatus>::move() () from /data/shared_libs/libnative_execution.so
#13 0x00007fa613c36bb2 in facebook::velox::exec::Spiller::runSpill(bool) () from /data/shared_libs/libnative_execution.so
#14 0x00007fa613c372eb in facebook::velox::exec::Spiller::spill(facebook::velox::exec::RowContainerIterator const*) () from /data/shared_libs/libnative_execution.so
#15 0x00007fa613d15a20 in facebook::velox::exec::GroupingSet::spill() () from /data/shared_libs/libnative_execution.so
#16 0x00007fa613d1e1a6 in facebook::velox::exec::HashAggregation::reclaim(unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&) () from /data/shared_libs/libnative_execution.so
#17 0x00007fa613c03534 in long std::__1::__function::__policy_invoker<long ()>::__call_impl[abi:ue170006]<std::__1::__function::__default_alloc_func<facebook::velox::exec::Operator::MemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::$_0, long ()> >(std::__1::__function::__policy_storage const*) () from /data/shared_libs/libnative_execution.so
#18 0x00007fa6142aa382 in facebook::velox::memory::MemoryReclaimer::run(std::__1::function<long ()> const&, facebook::velox::memory::MemoryReclaimer::Stats&) () from /data/shared_libs/libnative_execution.so
#19 0x00007fa613c03104 in facebook::velox::exec::Operator::MemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&) () from /data/shared_libs/libnative_execution.so
#20 0x00007fa613bfec58 in std::__1::unique_ptr<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult, std::__1::default_delete<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult> > std::__1::__function::__policy_invoker<std::__1::unique_ptr<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult, std::__1::default_delete<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult> > ()>::__call_impl[abi:ue170006]<std::__1::__function::__default_alloc_func<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::$_1, std::__1::unique_ptr<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult, std::__1::default_delete<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult> > ()> >(std::__1::__function::__policy_storage const*) () from /data/shared_libs/libnative_execution.so
#21 0x00007fa613bfea24 in facebook::velox::AsyncSource<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult>::runMake(std::__1::function<std::__1::unique_ptr<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult, std::__1::default_delete<facebook::velox::exec::ParallelMemoryReclaimer::reclaim(facebook::velox::memory::MemoryPool*, unsigned long, unsigned long, facebook::velox::memory::MemoryReclaimer::Stats&)::ReclaimResult> > ()>&) () from /data/shared_libs/libnative_execution.so

Turning off premetheusCpp metric collection would solve the problem for sure, but we want to see why this happens.

And after going through the code, we found that the Spiller's code is calling the updateHistogramMetric function too frequently.
In Spiller.nextBatch, the code would call appendPartition() every 64 rows or every 256KB:

std::unique_ptr<Spiller::SpillStatus> Spiller::writeSpill(int32_t partition) {
VELOX_CHECK_NE(type_, Type::kHashJoinProbe);
// Target size of a single vector of spilled content. One of
// these will be materialized at a time for each stream of the
// merge.
constexpr int32_t kTargetBatchBytes = 1 << 18; // 256K
constexpr int32_t kTargetBatchRows = 64;
RowVectorPtr spillVector;
auto& run = spillRuns_[partition];
try {
ensureSorted(run);
int64_t totalBytes = 0;
size_t written = 0;
while (written < run.rows.size()) {
extractSpillVector(
run.rows, kTargetBatchRows, kTargetBatchBytes, spillVector, written);
totalBytes += state_.appendToPartition(partition, spillVector);
if (totalBytes > state_.targetFileSize()) {

Function appendToPartition() would call finally try to update the histogramMetric as the stack above, which causes performance issue.

We also notice that the prestissimo also use prometheusCpp as the metricExporter, so we wonder if the prestissimo has encountered the same problem or not.
Besides, to solve the problem, we want to know if it's possible for velox to use larger batch size in Spiller code, or to reduce the usage of updateHistogramMetric, especially in critical session (frequent metric updating zone).

I could submit related code patches if the velox community agrees the modification.
Thanks a lot.

@HolyLow HolyLow added the enhancement New feature or request label Nov 20, 2024
@HolyLow HolyLow changed the title Updating histogram metric too frequently when spilling causes performance issue Updating histogram metric causes performance issue when spilling as it is updated too frequently Nov 20, 2024
@HolyLow
Copy link
Contributor Author

HolyLow commented Nov 20, 2024

@xiaoxmeng @oerling Could you please take a look at the problem? Thanks a lot.

@Yuhta
Copy link
Contributor

Yuhta commented Nov 20, 2024

A better solution is to improve PrometheusStatsReporter to use a thread local state instead of global to eliminate contention.

@HolyLow
Copy link
Contributor Author

HolyLow commented Nov 27, 2024

@Yuhta Thanks for your reply. So it would be more convenient to modify the prestoCpp layer rather than velox, am I getting it right?
Are there any changes for velox to carry out?
Any suggestion is welcome.

@Yuhta
Copy link
Contributor

Yuhta commented Nov 27, 2024

@HolyLow Yes it should be a pure optimization on PrometheusStatsReporter side. Write-heavy data structure usually has better performance with thread local.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants