Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dy2St] cuda_pinned_tensors_move_to_excepted_place move to C++ #69763

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

wanghuancoder
Copy link
Contributor

@wanghuancoder wanghuancoder commented Nov 27, 2024

PR Category

Execute Infrastructure

PR Types

Performance

Description

Python端的cuda_pinned_tensors_move_to_excepted_place耗时严重,将其下沉到C++。
#69722 尝试同时将blocking改为false,但是PR-CI-Windows-Inference的test_bert、test_mobile_net会出随机的精度问题。暂时没有调整blocking。理论上,动转静的这个函数运行在动态图,而动态图是单线程、单Stream的,H2D拷贝发生在计算流上。是可以将blocking设为false的。

使用helixfold APB子图做性能分析,cuda_pinned_tensors_move_to_excepted_place的耗时占比为5.84%:
1732690074394

下沉到C++后耗时占比为:0.23%

Pcard-67164

Copy link

paddle-bot bot commented Nov 27, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant