[Dy2St] cuda_pinned_tensors_move_to_excepted_place move to C++ #69763
+70
−25
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Execute Infrastructure
PR Types
Performance
Description
Python端的cuda_pinned_tensors_move_to_excepted_place耗时严重,将其下沉到C++。
#69722 尝试同时将blocking改为false,但是PR-CI-Windows-Inference的test_bert、test_mobile_net会出随机的精度问题。暂时没有调整blocking。理论上,动转静的这个函数运行在动态图,而动态图是单线程、单Stream的,H2D拷贝发生在计算流上。是可以将blocking设为false的。
使用helixfold APB子图做性能分析,cuda_pinned_tensors_move_to_excepted_place的耗时占比为5.84%:
下沉到C++后耗时占比为:0.23%
Pcard-67164