Get struck when change parallel to cone in single-GPU #560

GreameLee · 2024-06-20T16:32:05Z

I can run iterative algorithms such as ossart 2D parallel beam CT reconstruction smooth on ubuntu. But when it comes for cone-beam( for 2d it is fanbeam) The running just gets stuck and prints nothing, even the estimation time.
This is the code:

import tigre
import numpy as np
from tigre.utilities import sample_loader
from tigre.utilities import CTnoise
import tigre.algorithms as algs
from scipy.io import loadmat
import matplotlib.pyplot as plt
import torch

geo = tigre.geometry()
# VARIABLE                                   DESCRIPTION                    UNITS
# -------------------------------------------------------------------------------------
# Distances
geo.DSD = 1536  # Distance Source Detector      (mm)
geo.DSO = 1000 # Distance Source Origin        (mm)
# Image parameters
geo.nVoxel = np.array([1, 256, 256])  # number of voxels              (vx)
geo.sVoxel = np.array([1, 256, 256])  # total size of the image       (mm)
geo.dVoxel = geo.sVoxel / geo.nVoxel  # size of each voxel            (mm)
print(geo.dVoxel)
# Detector parameters
geo.nDetector = np.array([1, 1512])  # number of pixels              (px)
geo.dDetector = np.array([geo.dVoxel[0], 0.8])  # size of each pixel            (mm)
# geo.dDetector = np.array([1, 0.1])
geo.sDetector = geo.nDetector * geo.dDetector  # total size of the detector    (mm)
# Offsets
geo.offOrigin = np.array([0, 0, 0])  # Offset of image from origin   (mm)
geo.offDetector = np.array([0, 0])  # Offset of Detector            (mm)
# MAKE SURE THAT THE DETECTOR PIXELS SIZE IN V IS THE SAME AS THE IMAGE!

geo.mode = "cone"

#%% Define angles of projection and load phatom image

angles = np.linspace(0, 2 * np.pi, 180)


img = sample_loader.load_head_phantom(geo.nVoxel)
limited_angle = np.linspace(0, 2*np.pi, 90)
limited_projection = tigre.Ax(img, geo, limited_angle)
tigre.plotSinogram(limited_projection, 0)

niter=500
print("start ossart_tv")
rec_img_ossart_tv = algs.ossart_tv(limited_projection, geo, limited_angle, niter)
plt.imshow(rec_img_ossart_tv.squeeze(), cmap='gray')
plt.savefig('ossart_tv_90.png', bbox_inches='tight', pad_inches=0)
plt.axis('off')

python version: 3.9
OS:Ubuntu 18.04.6 LTS
CUDA version:12.3

The text was updated successfully, but these errors were encountered:

AnderBiguri · 2024-06-20T16:39:00Z

That is strange! if you do it for 3D, does it work well?
Can you try to figure out where does it get stuck?

Probably related to #552

GreameLee · 2024-06-20T17:00:26Z

I try to make a breakpoint and it got stuck in this column:

194        self.set_w()

of iterative_recon_alg.py

AnderBiguri · 2024-06-20T17:01:27Z

@GreameLee and inside there, do you know where?

GreameLee · 2024-06-20T17:11:15Z

@AnderBiguri yes, I step inside and it is in

222        W = Ax(
223            np.ones(geox.nVoxel, dtype=np.float32), geox, self.angles, "Siddon", gpuids=self.gpuids
224        )

AnderBiguri · 2024-06-21T08:44:16Z

I see! I will change this function soon. For now, if you give this function geo instead of geox, I believe it should work.

GreameLee · 2024-06-21T14:37:35Z

I see! I will change this function soon. For now, if you give this function geo instead of geox, I believe it should work.

Do you mean that replace all geox with geo in iterative_recon_alg.py? Cause there is an assignment about geox before this line:

216     geo = copy.deepcopy(self.geo)

I think geox is geo already

AnderBiguri · 2024-06-21T14:39:05Z

Yes, in particular in the set_w function.

AnderBiguri · 2024-06-21T15:42:29Z

I meant:

def set_w(self):
        """
        Calculates value of W if this is not given.
        :return: None
        """
        geo=self.geo
        W = Ax(
            np.ones(geo.nVoxel, dtype=np.float32), geo, self.angles, "Siddon", gpuids=self.gpuids
        )
        W[W <= min(self.geo.dVoxel / 2)] = np.inf
        W = 1.0 / W
        setattr(self, "W", W)

GreameLee · 2024-06-21T15:48:35Z

Yes, I tried as

def set_w(self):
        """
        Calculates value of W if this is not given.
        :return: None
        """
        geo=self.geo
        W = Ax(
            np.ones(geo.nVoxel, dtype=np.float32), geo, self.angles, "Siddon", gpuids=self.gpuids
        )
        W[W <= min(self.geo.dVoxel / 2)] = np.inf
        W = 1.0 / W
        setattr(self, "W", W)

it but still got struck

AnderBiguri · 2024-06-21T15:51:01Z

Hum, confusing. I don't really know why it happens then.... I will investigate, but its hard, as I can not make it happen in any of my 5 computers.

GreameLee · 2024-06-21T18:19:01Z

Are they based on Linux Ubuntu? @AnderBiguri I run it in Ubuntu and it gets struck again in the same line. This issue won't happen in Windows

GreameLee · 2024-06-22T03:27:46Z

I go deep inside that step and go into the Ax function in utilities, and find it got struck in gpu.py with this line:

31   def __len__(self):
32           return len(self.devices)

AnderBiguri · 2024-06-22T09:45:28Z

Hum that is strange! What if you change it to return your_number_of_gpus , however many those are?

GreameLee · 2024-06-22T17:45:10Z

yes, I edit it as :

def __len__(self):
        return 1

But it still get struck in following line of Ax.py:

35  return _Ax_ext(img, geox, geox.angles, projection_type, geox.mode, gpuids=gpuids)

can I ask what is the "_Ax_ext" function? I see that

from _Ax import _Ax_ext

But I did not see any function named _Ax_ext
The strange thing is that before ossarttv, there is tigre.Ax using Ax function and that line goes smooth.

AnderBiguri · 2024-06-22T18:06:41Z

Its _Ax_ext that fails indeed, its the CUDA code for the forward projection. I still not really understand why it fails sometimes only in some machines. If you are using exactly the same geometry any algorithm should work the same. Its really confusing.

GreameLee · 2024-06-23T22:11:01Z

Its _Ax_ext that fails indeed, its the CUDA code for the forward projection. I still not really understand why it fails sometimes only in some machines. If you are using exactly the same geometry any algorithm should work the same. Its really confusing.

I think this may happen on supercomputers based on Ubuntu. I tried it on another supercomputer but got struck, too. It is kind of serious issue

AnderBiguri · 2024-06-24T11:16:06Z

I have used ~15 Ubuntu based machines and I can't reproduce :(
It is indeed a serious issue.

I suggest the following: replace set_w() such that it returns geo.sVoxel(1). Its not perfect, but it should make the algorithms work.

GreameLee · 2024-06-24T14:41:33Z

I set:

def set_w(self):
    self.W = self.geo.sVoxel[1]
    return self.W

But this bug happen when it comes to "self.W[ang_index]" of this part in iterative_recon_alg.py:

 ang_index = self.angle_index[iteration].astype(np.int32)

        self.res += (
            self.lmbda
            * 1.0
            / self.V[iteration]
            * Atb(
                self.W[ang_index]

The error shows like:

IndexError: invalid index to scalar variable.

AnderBiguri · 2024-06-24T14:47:55Z

Ah yes, sorry I made a mistake. its more or less that but not exactly what I said. I don't have time now to fix it.

If you want to fix it yourself:

Understand what W is in size.
Replace it by something of the same size but of value 1/geo.svoxel[1]
I'll try to come back and fix this at some point. Apologies, end of the year gets a bit busy .

GreameLee · 2024-06-26T02:08:25Z

hello, Ander, @AnderBiguri I try to use Windows to retry this, and interesting, it runs smooth for the ossart-tv but when I try to copy the output from CPU to GPU, this error will happen:

RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

In the same codes, for parallel-beam CT, it does not have this error.
I guess when using Ax for cone-beam CT. The _Ax_ext makes the data have wrong leakage so the data can not transfer from CPU to GPU

AnderBiguri · 2024-06-26T07:12:35Z

Are you using this with pytorch?

…

On Wed, 26 Jun 2024, 03:08 Haodong, ***@***.***> wrote: hello, Ander, @AnderBiguri <https://github.com/AnderBiguri> I try to use windows to retry this, and interesting, when it run smooth for the ossart-tv but when I try to copy the output from CPU to GPU, this error will happened: RuntimeError: CUDA error: invalid argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. — Reply to this email directly, view it on GitHub <#560 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2OENF3YVUXJBBNAR7EFOTZJIPC5AVCNFSM6AAAAABJUJYU3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJQGM4TKOJTGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

GreameLee · 2024-06-26T11:05:36Z

Yes, I use tensor_gpu = tensor_cpu.to(device='cuda') to copy tensor from results from tigre

AnderBiguri · 2024-06-26T11:09:10Z

TIGRE variables you have access too are always in the CPU.

There have been problems in the past relating TIGRE and pytorch (e.g. #509) which we will look into solving

GreameLee · 2024-06-30T16:38:19Z

I can run cone-beam ossart-tv by running the example code in UBUNTU now.
But when I insert this process in my diffusion model based on Pytorch and tensor in GPU. This reconstruction of ossart-tv will get struck and even I want to use "ctrl+C" to stop it can not be killed. I think I get struck due to the same reason as WINDOWS.
And interestingly, all stuff goes smoothly in parallel-beam CT. And I guess this error is same as #509.
The _Ax_ext works wrong for cone-beam CT reconstruction. This issue may be because there is some problem when transferring data

AnderBiguri · 2024-06-30T17:03:07Z

@GreameLee That is actually quite useful information yes.

Just to clarify, TIGRE+pytorch is not suppored yet, as I have not investigated in detail how to make it work. Likely Ubuntu/windows has nothing to do with this problem.

But if it works for parallel and not cone, this is something I can dig into.

GreameLee · 2024-07-01T13:58:45Z

Look forward to your exploring

AnderBiguri · 2024-07-01T14:42:26Z

Related: #563

AnderBiguri · 2024-10-15T11:10:40Z

Hi @GreameLee

I know its been a while, but if you are still playing with this, could you try commenting out this line:
https://github.com/CERN/TIGRE/blob/7e8ec5454af09fada9e52fac6d81d91a19a5bbe2/Common/CUDA/Siddon_projection.cu#L584C4-L584C23
, recompiling TIGRE, and then seing if it works now?

AnderBiguri · 2024-10-22T17:04:41Z

@GreameLee pytorch is not compatible with TIGRE, see demo 25!

GreameLee changed the title ~~struck when change parallel to cone~~ Get struck when change parallel to cone Jun 20, 2024

GreameLee changed the title ~~Get struck when change parallel to cone~~ Get struck when change parallel to cone in single-GPU Jun 20, 2024

AnderBiguri closed this as completed Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get struck when change parallel to cone in single-GPU #560

Get struck when change parallel to cone in single-GPU #560

GreameLee commented Jun 20, 2024 •

edited

Loading

AnderBiguri commented Jun 20, 2024

GreameLee commented Jun 20, 2024 •

edited

Loading

AnderBiguri commented Jun 20, 2024

GreameLee commented Jun 20, 2024 •

edited

Loading

AnderBiguri commented Jun 21, 2024

GreameLee commented Jun 21, 2024 •

edited

Loading

AnderBiguri commented Jun 21, 2024

AnderBiguri commented Jun 21, 2024 •

edited

Loading

GreameLee commented Jun 21, 2024

AnderBiguri commented Jun 21, 2024

GreameLee commented Jun 21, 2024 via email •

edited

Loading

GreameLee commented Jun 22, 2024

AnderBiguri commented Jun 22, 2024

GreameLee commented Jun 22, 2024 •

edited

Loading

AnderBiguri commented Jun 22, 2024

GreameLee commented Jun 23, 2024

AnderBiguri commented Jun 24, 2024

GreameLee commented Jun 24, 2024

AnderBiguri commented Jun 24, 2024

GreameLee commented Jun 26, 2024 •

edited

Loading

AnderBiguri commented Jun 26, 2024 via email

GreameLee commented Jun 26, 2024 via email •

edited

Loading

AnderBiguri commented Jun 26, 2024 •

edited

Loading

GreameLee commented Jun 30, 2024 •

edited

Loading

AnderBiguri commented Jun 30, 2024 •

edited

Loading

GreameLee commented Jul 1, 2024

AnderBiguri commented Jul 1, 2024

AnderBiguri commented Oct 15, 2024

AnderBiguri commented Oct 22, 2024

Get struck when change parallel to cone in single-GPU #560

Get struck when change parallel to cone in single-GPU #560

Comments

GreameLee commented Jun 20, 2024 • edited Loading

AnderBiguri commented Jun 20, 2024

GreameLee commented Jun 20, 2024 • edited Loading

AnderBiguri commented Jun 20, 2024

GreameLee commented Jun 20, 2024 • edited Loading

AnderBiguri commented Jun 21, 2024

GreameLee commented Jun 21, 2024 • edited Loading

AnderBiguri commented Jun 21, 2024

AnderBiguri commented Jun 21, 2024 • edited Loading

GreameLee commented Jun 21, 2024

AnderBiguri commented Jun 21, 2024

GreameLee commented Jun 21, 2024 via email • edited Loading

GreameLee commented Jun 22, 2024

AnderBiguri commented Jun 22, 2024

GreameLee commented Jun 22, 2024 • edited Loading

AnderBiguri commented Jun 22, 2024

GreameLee commented Jun 23, 2024

AnderBiguri commented Jun 24, 2024

GreameLee commented Jun 24, 2024

AnderBiguri commented Jun 24, 2024

GreameLee commented Jun 26, 2024 • edited Loading

AnderBiguri commented Jun 26, 2024 via email

GreameLee commented Jun 26, 2024 via email • edited Loading

AnderBiguri commented Jun 26, 2024 • edited Loading

GreameLee commented Jun 30, 2024 • edited Loading

AnderBiguri commented Jun 30, 2024 • edited Loading

GreameLee commented Jul 1, 2024

AnderBiguri commented Jul 1, 2024

AnderBiguri commented Oct 15, 2024

AnderBiguri commented Oct 22, 2024

GreameLee commented Jun 20, 2024 •

edited

Loading

GreameLee commented Jun 20, 2024 •

edited

Loading

GreameLee commented Jun 20, 2024 •

edited

Loading

GreameLee commented Jun 21, 2024 •

edited

Loading

AnderBiguri commented Jun 21, 2024 •

edited

Loading

GreameLee commented Jun 21, 2024 via email •

edited

Loading

GreameLee commented Jun 22, 2024 •

edited

Loading

GreameLee commented Jun 26, 2024 •

edited

Loading

GreameLee commented Jun 26, 2024 via email •

edited

Loading

AnderBiguri commented Jun 26, 2024 •

edited

Loading

GreameLee commented Jun 30, 2024 •

edited

Loading

AnderBiguri commented Jun 30, 2024 •

edited

Loading