-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get struck when change parallel to cone in single-GPU #560
Comments
That is strange! if you do it for 3D, does it work well? Probably related to #552 |
I try to make a breakpoint and it got stuck in this column:
of iterative_recon_alg.py |
@GreameLee and inside there, do you know where? |
@AnderBiguri yes, I step inside and it is in
|
I see! I will change this function soon. For now, if you give this function |
Do you mean that replace all geox with geo in iterative_recon_alg.py? Cause there is an assignment about geox before this line:
I think geox is geo already |
Yes, in particular in the |
I meant:
|
Yes, I tried as
it but still got struck |
Hum, confusing. I don't really know why it happens then.... I will investigate, but its hard, as I can not make it happen in any of my 5 computers. |
Are they based on Linux Ubuntu? @AnderBiguri I run it in Ubuntu and it gets struck again in the same line. This issue won't happen in Windows
|
I go deep inside that step and go into the Ax function in utilities, and find it got struck in gpu.py with this line:
|
Hum that is strange! What if you change it to |
yes, I edit it as :
But it still get struck in following line of Ax.py:
can I ask what is the "_Ax_ext" function? I see that
But I did not see any function named _Ax_ext |
Its |
I think this may happen on supercomputers based on Ubuntu. I tried it on another supercomputer but got struck, too. It is kind of serious issue |
I have used ~15 Ubuntu based machines and I can't reproduce :( I suggest the following: replace |
I set:
But this bug happen when it comes to "self.W[ang_index]" of this part in iterative_recon_alg.py:
The error shows like:
|
Ah yes, sorry I made a mistake. its more or less that but not exactly what I said. I don't have time now to fix it. If you want to fix it yourself:
|
hello, Ander, @AnderBiguri I try to use Windows to retry this, and interesting, it runs smooth for the ossart-tv but when I try to copy the output from CPU to GPU, this error will happen:
In the same codes, for parallel-beam CT, it does not have this error. |
Are you using this with pytorch?
…On Wed, 26 Jun 2024, 03:08 Haodong, ***@***.***> wrote:
hello, Ander, @AnderBiguri <https://github.com/AnderBiguri> I try to use
windows to retry this, and interesting, when it run smooth for the
ossart-tv but when I try to copy the output from CPU to GPU, this error
will happened:
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
—
Reply to this email directly, view it on GitHub
<#560 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC2OENF3YVUXJBBNAR7EFOTZJIPC5AVCNFSM6AAAAABJUJYU3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJQGM4TKOJTGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, I use tensor_gpu = tensor_cpu.to(device='cuda') to copy tensor from results from tigre
|
TIGRE variables you have access too are always in the CPU. There have been problems in the past relating TIGRE and pytorch (e.g. #509) which we will look into solving |
I can run cone-beam ossart-tv by running the example code in UBUNTU now. |
@GreameLee That is actually quite useful information yes. Just to clarify, TIGRE+pytorch is not suppored yet, as I have not investigated in detail how to make it work. Likely Ubuntu/windows has nothing to do with this problem. But if it works for parallel and not cone, this is something I can dig into. |
Look forward to your exploring |
Related: #563 |
Hi @GreameLee I know its been a while, but if you are still playing with this, could you try commenting out this line: |
@GreameLee pytorch is not compatible with TIGRE, see demo 25! |
I can run iterative algorithms such as ossart 2D parallel beam CT reconstruction smooth on ubuntu. But when it comes for cone-beam( for 2d it is fanbeam) The running just gets stuck and prints nothing, even the estimation time.
This is the code:
The text was updated successfully, but these errors were encountered: