Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPCDS SF10K Query 6 failed #11632

Open
minhancao opened this issue Nov 23, 2024 · 1 comment
Open

TPCDS SF10K Query 6 failed #11632

minhancao opened this issue Nov 23, 2024 · 1 comment
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@minhancao
Copy link
Contributor

minhancao commented Nov 23, 2024

Bug description

Ran TPCDS SF10k with 8 Velox workers 128 gb memory each with AsyncDataCache enabled and disabled CoW, failed on query 6 due to

VELOX_CHECK_LE(numOut, outputBatchSize);

Worker 3:

E20241122 21:26:29.133782   790 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (52 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE

Worker 6:

E20241122 21:26:29.129307   195 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (69 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE
E20241122 21:26:29.129565   782 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (13 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE

Worker 7:

E20241122 21:26:29.128383   769 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (117 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE

Worker 8:

E20241122 21:26:29.135933   772 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (64 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE

Worker 3's stack dump trace:

VeloxRuntimeError: numOut <= outputBatchSize (52 vs. 3) Operator: HashProbe[1922] 1
	at Unknown.# 0  _ZN8facebook5velox7process10StackTraceC1Ei(Unknown Source)
	at Unknown.# 1  _ZN8facebook5velox14VeloxExceptionC2EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_(Unknown Source)
	at Unknown.# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_(Unknown Source)
	at Unknown.# 3  _ZN8facebook5velox4exec9HashProbe17getOutputInternalEb(Unknown Source)
	at Unknown.# 4  _ZN8facebook5velox4exec9HashProbe9getOutputEv(Unknown Source)
	at Unknown.# 5  _ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv(Unknown Source)
	at Unknown.# 6  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE(Unknown Source)
	at Unknown.# 7  _ZN8facebook5velox4exec6Driver3runESt10shared_ptrIS2_E(Unknown Source)
	at Unknown.# 8  _ZN5folly6detail8function5call_IZN8facebook5velox4exec6Driver7enqueueESt10shared_ptrIS6_EEUlvE_Lb1ELb0EvJEEET2_DpT3_RNS1_4DataE(Unknown Source)
	at Unknown.# 9  _ZN5folly6detail8function14FunctionTraitsIFvvEEclEv(Unknown Source)
	at Unknown.# 10 _ZN5folly18ThreadPoolExecutor7runTaskERKSt10shared_ptrINS0_6ThreadEEONS0_4TaskE(Unknown Source)
	at Unknown.# 11 _ZN5folly21CPUThreadPoolExecutor9threadRunESt10shared_ptrINS_18ThreadPoolExecutor6ThreadEE(Unknown Source)
	at Unknown.# 12 _ZSt13__invoke_implIvRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEERPS1_JRS4_EET_St21__invoke_memfun_derefOT0_OT1_DpOT2_(Unknown Source)
	at Unknown.# 13 _ZSt8__invokeIRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEJRPS1_RS4_EENSt15__invoke_resultIT_JDpT0_EE4typeEOSC_DpOSD_(Unknown Source)
	at Unknown.# 14 _ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EE6__callIvJEJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE(Unknown Source)
	at Unknown.# 15 _ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EEclIJEvEET0_DpOT_(Unknown Source)
	at Unknown.# 16 _ZN5folly6detail8function5call_ISt5_BindIFMNS_18ThreadPoolExecutorEFvSt10shared_ptrINS4_6ThreadEEEPS4_S7_EELb1ELb0EvJEEET2_DpT3_RNS1_4DataE(Unknown Source)
	at Unknown.# 17 0x00000000000dbad4(Unknown Source)
	at Unknown.# 18 start_thread(Unknown Source)
	at Unknown.# 19 clone(Unknown Source)

System information

N/A

Relevant logs

No response

@minhancao minhancao added bug Something isn't working triage Newly created issue that needs attention. labels Nov 23, 2024
@minhancao minhancao changed the title TPCDS Query 6 failed TPCDS SF10K Query 6 failed Nov 23, 2024
@Yuhta
Copy link
Contributor

Yuhta commented Nov 26, 2024

Can you check if the issue is still there after #11659? CC: @zhli1142015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

2 participants