IDC DICOM use itkwasm-dicom, idc python package #757

thewtex · 2024-08-23T20:13:46Z

No description provided.

review-notebook-app · 2024-08-23T20:13:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

thewtex · 2024-08-23T20:16:15Z

@jadh4v can you please take a look? We are getting slight differences with the origin and direction signs between the CT and SEG. @fedorov does this ring any bells?

thewtex · 2024-08-23T20:36:49Z

Rendered preview: https://app.reviewnb.com/InsightSoftwareConsortium/itkwidgets/pull/757/

fedorov · 2024-08-23T21:48:16Z

To make this easier to review, if I understand correctly from https://app.reviewnb.com/InsightSoftwareConsortium/itkwidgets/pull/757/, this is SEG:

origin=[-177.300003, 168.339722, 3.80999756],
    spacing=[0.660156, 0.660156, 2.5],
    direction=array([[ 1.,  0.,  0.],
       [ 0., -1.,  0.],
       [ 0.,  0., -1.]]),
    size=[512, 512, 109],

and this is CT:

    Size: [512, 512, 110]
  Spacing: [0.660156, 0.660156, 2.5]
  Origin: [-177.3, -169, -268.69]
  Direction: 
1 0 0
0 1 0
0 0 1

I think there are multiple issues that are/may be involved.

When segmentation is created from a DICOM image, there are quite a few transformations/recalculations that take place to build 3d volume from DICOM slices. This can lead to loss of precision, which can accumulate over the volume, which does not preserve locations per-slice. Here's a relevant issue where I thought we did the best we could to reduce loss of precision: BUG: serialize floats with maximum precision QIICR/dcmqi#416
we did ran into the issue where, again, due to rounding, the number of slices can end up being different between CT and SEG. I thought we fixed this "off by 1" in Revisit calculation of the number of slices for SEG conversion QIICR/dcmqi#275.
the number of slices in SEG absolutely does not have to match the number of slices in CT if segmentation contains empty frames, since those can be skipped on encoding SEG. This is a common source of confusion. If empty slices are not skipped, you may end up with very large SEG files, since there is no good way to compress those frames for SEG....
it could be that pathways for reading CT and SEG follow different conventions (is the difference in the formatting of the image metadata a sign that even output representation is different between the two?), and load of CT reorients the image buffer. I thought it is commonly done while reading into NIfTI.

My recommendation has been to resample SEG to the image geometry with nearest neighbor interpolator before working with the arrays. It is not pretty, but should be lossless. I am not sure there is a better way.

@pieper do you have any comments?

fedorov · 2024-08-23T22:00:31Z

@thewtex related to this PR, in

!{sys.executable} -m pip install -q pooch itk-io "itkwidgets[all]>=1.0a49" pydicom rich s5cmd itkwasm-dicom numpy

I recommend you instead of s5cmd install idc-index. idc-index will bring s5cmd as a dependency, but will also provide additional functions to simplify interaction with IDC.

For example, the cell below:

%%capture
# CT series downloaded from IDC, NLST collection (https://doi.org/10.7937/TCIA.HMQ8-J677)
!s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com cp "s3://public-datasets-idc/865da32b-cea7-4e6d-b4fe-9731bf08b3d2/*" CT_DICOM_series

# segmentation of this series downloaded from IDC, from nnU-Net-BPR-Annotations (https://doi.org/10.5281/zenodo.7539035)
!s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com cp "s3://public-datasets-idc/3a0c03fe-3d17-48dc-9b8d-f64f5302e0a4/*" SEG_DICOM_series

will become the following:

# CT series downloaded from IDC, NLST collection (https://doi.org/10.7937/TCIA.HMQ8-J677)
!idc download 1.2.840.113654.2.55.247854884634057477137769379611681725965

# segmentation of this series downloaded from IDC, from nnU-Net-BPR-Annotations (https://doi.org/10.5281/zenodo.7539035)
!idc download 1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219

In the above, 865da32b-cea7-4e6d-b4fe-9731bf08b3d2 is IDC crdc_series_uuid (not a DICOM attribute, DICOM does not support object versioning natively), and 1.2.840.113654.2.55.247854884634057477137769379611681725965 is DICOM SeriesInstanceUID.

crdc_series_uuid uniquely identifies the binary files associated with a series, meaning that if the specific DICOM series changes in the future but will keep the same SeriesInstanceUID, the files corresponding to the new version of the series will get assigned a new crdc_series_uuid.

idc download command will pull requested DICOM series defined by SeriesInstanceUID (it can also download content referenced by PatientID, StudyInstanceUID, or collection_id, and will sort the files into hierarchy on download) from the current version of IDC.

We definitely can add the functionality to download by crdc_series_uuid, just didn't have a use case for that. I created an issue in ImagingDataCommons/idc-index#117.

In any case, I strongly recommend idc-index to simplify the code and open the door for other opportunities to work with IDC for the notebook users!

jadh4v · 2024-08-24T03:02:40Z

@thewtex My first impression is that the coordinate system chosen for SEG is different. I suppose it depends on how the SEG was generated. Also agree with @fedorov regarding precision loss as well as skipping of slices (accounts for dim difference).
Looks like we need a baseline understanding of what to expect from a pair of DICOM + SEG dataset. One way to ensure that is to generate our own data but we'd we living in our own bubble reading+writing our own test data.

Would there be a way to interpret the SEG from any third party software (even if closed source), get it converted and then compare the outputs?

pieper · 2024-08-24T13:46:00Z

@pieper do you have any comments?

Lots of good things to discuss here - a main point being that the SEG and CT are essentially two independent entities with their own spacings, dimensions, orientations, etc that are only shown together when they share a frame of reference UID (or there's a spatial registration to map between frames of reference. So we shouldn't, for example, consider the CT geometry when creating the SEG object, but rather create the most faithful representation of the input data.

fedorov · 2024-08-26T13:57:16Z

the SEG and CT are essentially two independent entities with their own spacings, dimensions, orientations, etc that are only shown together when they share a frame of reference UID (or there's a spatial registration to map between frames of reference

I agree with Steve, this is a perfect summary! Also, since I wrote the above, I realized that original segmentations were created in NIfTI format using nnU-Net, which were then converted into DICOM SEG using dcmqi. Change in image orientation most likely happened during NIfTI conversion.

A separate question is why itkWidgets don't seem to be able to correctly render segmentation label that has a different orientation compared to the image? Is this a bug?

thewtex · 2024-08-28T22:52:34Z

I agree with Steve, this is a perfect summary! Also, since I wrote the above, I realized that original segmentations were created in NIfTI format using nnU-Net, which were then converted into DICOM SEG using dcmqi. Change in image orientation most likely happened during NIfTI conversion.

Yes, this does seem to be the source of the problem. What would you like to do? A few ideas:

Correct the metadata in the DICOM SEG in IDC (ideally this would happen regardless at some point)
Apply a correction in this notebook
Use a different example input dataset that does have correct metadata

Lots of good things to discuss here - a main point being that the SEG and CT are essentially two independent entities with their own spacings, dimensions, orientations, etc that are only shown together when they share a frame of reference UID (or there's a spatial registration to map between frames of reference. So we shouldn't, for example, consider the CT geometry when creating the SEG object, but rather create the most faithful representation of the input data.

Yes, we are trying to use the metadata of the CT and SEG as encoded individually in each. But that input metadata appears to have issues.

My recommendation has been to resample SEG to the image geometry with nearest neighbor interpolator before working with the arrays.

This is already the case. In fact, we even use an interpolator with better than nearest-neighbor interpolation for labels, available in a small, universal ITK-Wasm python package :-).

If we apply the CT metadata to the SEG, this is what we get:

Note that it is also incorrect -- the heart and esophagus are inverted.

This is what the current metadata is specifying:

The empty box is the domain of the SEG -- vtk.js has a limitation of only rendering one volume.

I recommend you instead of s5cmd install idc-index. idc-index will bring s5cmd as a dependency, but will also provide additional functions to simplify interaction with IDC.

Most excellent!! 🎇 Done. 🍾

Simplify by using itkwasm-dicom instead of pydicom-seg. Also use the s5cmd Python package.

pieper · 2024-08-28T23:12:34Z

Yes, we are trying to use the metadata of the CT and SEG as encoded individually in each. But that input metadata appears to have issues.

I hadn't realized that, but yes, one of the liver segs here has incorrect geometry. We should get the data fixed and not use this for a demonstration or tutorial (unless we want to warn about bad metadata).

fedorov · 2024-08-29T02:09:25Z

Yes, we are trying to use the metadata of the CT and SEG as encoded individually in each. But that input metadata appears to have issues.

@thewtex I don't see what are the issues with the data - the image is oriented differently, but I am not aware of any issues with the metadata. It works fine in Slicer, which is using dcmqi, screenshot below. You can see this series visualized in OHIF, which is using completely independent DICOM readers: https://viewer.imaging.datacommons.cancer.gov/v3/viewer/?StudyInstanceUIDs=1.2.840.113654.2.55.133032926508606330165457723341736723804&SeriesInstanceUIDs=1.2.840.113654.2.55.247854884634057477137769379611681725965,1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219.

This is already the case. In fact, we even use an interpolator with better than nearest-neighbor interpolation for labels, available in a small, universal ITK-Wasm python package :-).

Where is this taking place? I did not see any resampling in the notebook code, unless I missed it.

I hadn't realized that, but yes, one of the liver segs here has incorrect geometry.

@pieper which sample are you referring to? I do not see any liver segmentations in the sample used in the notebook: https://viewer.imaging.datacommons.cancer.gov/v3/viewer/?StudyInstanceUIDs=1.2.840.113654.2.55.133032926508606330165457723341736723804&SeriesInstanceUIDs=1.2.840.113654.2.55.247854884634057477137769379611681725965,1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219.

pieper · 2024-08-29T13:28:10Z

Thanks for the link to the actual data @fedorov. The data I was looking at came from a different study, but I don't have the uid handy. So there is a dataset with a bad liver segmentation geometry but it's not the one from this issue. Sorry for the extra confusion.

@thewtex - since Andrey and I confirmed the geometry is read correctly in OHIF and Slicer for StudyInstanceUID 1.2.840.113654.2.55.133032926508606330165457723341736723804 can you show and itk widgets example for it?

The seg and the referenced series uids are:

[0008,1115]	ReferencedSeriesSequence		SQ	12910
	[fffe,e000]	Item		na	12894
		[0020,000e]	SeriesInstanceUID	1.2.840.113654.2.55.247854884634057477137769379611681725965	UI	60
[0020,000e]	SeriesInstanceUID	1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219	UI	58

thewtex · 2024-08-29T15:24:56Z

@fedorov @pieper interesting!

Here is the metadata I get for the CT when loaded in 3D Slicer:

I was not able to load the QuantiativeReporting extension for DCMQI in Slicer -- I get an extension not found with Slicer 5.6.2 and a backtrace when it tries to start with 5.7.0. Do you know the Slicer-loaded metadata for the SEG?

This seems to be different from what we get in this notebook,

    Size: [512, 512, 110]
  Spacing: [0.660156, 0.660156, 2.5]
  Origin: [-177.3, -169, -268.69]
  Direction: 
1 0 0
0 1 0
0 0 1

I suspect there is a difference with how the CT is loaded and interpreted.

StudyInstanceUID 1.2.840.113654.2.55.133032926508606330165457723341736723804 can you show and itk widgets example for it?

Yes, that is what is in this PR.

Where is this taking place? I did not see any resampling in the notebook code, unless I missed it.

This is part of the rendering process.

thewtex · 2024-08-29T15:29:45Z

~~> This seems to be different from what we get in this notebook~~

Well, we also have LPS / RAS to account for.

thewtex · 2024-08-29T15:45:11Z

Do you know the Slicer-loaded metadata for the SEG?

This may be enlightening.

fedorov · 2024-08-29T16:20:59Z

I was not able to load the QuantiativeReporting extension for DCMQI in Slicer -- I get an extension not found with Slicer 5.6.2 and a backtrace when it tries to start with 5.7.0.

Can you tell me what OS you are on? I just uninstalled/installed, and everything works for me on Mac with 5.6.2.

I checked cdash, and I do see an error for stable mac build for today, which seems to be an intermittent failure to clone the repo: https://slicer.cdash.org/builds/3504732/configure.

I also tested on 5.7.0, and did not see any problems.

Do you know the Slicer-loaded metadata for the SEG?

This part is tricky with my knowledge of Slicer.... The issue is that segmentations are not loaded into Slicer scalar volumes, but into "Slicer Segmentations". The only way I know of to see volume-level metadata (spacing, orientation, etc) is to export "Segmentation" into a volume, but that operation takes an image volume as a reference, so the result will not tell us what you want to know... Maybe @pieper or @lassoan can help here? How can we know the metadata for loaded segmentation similar to what Volumes module shows as in #757 (comment)?

pieper · 2024-08-29T16:21:37Z

Here's what I get if I export the segmentation as a labelmap (appears to match the CT exactly).

fedorov · 2024-08-29T16:22:50Z

My suspicion is that resampling is not done correctly in itkwidgets.

fedorov · 2024-08-29T16:23:45Z

Here's what I get if I export the segmentation as a labelmap (appears to match the CT exactly).

Doesn't that export process take reference volume as reference, and resamples to that reference? It would always match the geometry of the reference then, but it does not reflect geometry "as loaded".

pieper · 2024-08-29T16:30:07Z

A couple things

the extension server seems to be a bit flakey today. I tried 5.6.2 earlier on linux and got 0 extensions, then restarted Slicer and I was able to install as usual.
I'm pretty sure that the binary labelmap representation of the segmention will remain the 'primary' since that's what the SEG would be input as and won't be altered by export unless you specifically ask for it. If you loaded, e.g. an STL, it wouldn't be rasterized until you asked for it, typically in the Segment Editor in the context of a volume to use as the reference grid.
In this case the ImageOrientationPatient vectors are different in the SEG vs the CT so probably something in the reading process is converting the SEG to Axial IS.

fedorov · 2024-08-29T17:35:30Z

In this case the ImageOrientationPatient vectors are different in the SEG vs the CT so probably something in the reading process is converting the SEG to Axial IS.

No, I don't think it is in the reading process. As I tried to explain earlier, I am pretty certain this is happening in the process of generating the original representation of the segmentation by nnU-Net that is then converted into SEG.

This is the approximate provenance of the SEG we are looking at:

[DICOM CT] --> nnU-Net --> [NIfTI] --> dcmqi --> [DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

There is no way around it - we should just accept that orientations may differ, and make sure resampling is implemented correctly for the systems that don't render in patient space.

pieper · 2024-08-29T17:59:13Z

Actually, the pipeline is:

[DICOM CT] --> [NIfTI] --> nnU-Net --> [NIfTI] --> dcmqi --> [DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

And it's possible that the first conversion from dicom to nii introduced a reorganization of the data.

By the "something in the reading process is converting the SEG to Axial IS." what I mean is that the CT and the SEG have different ImageOrientationPatient vectors in the instances, but they end up having the same IJK to RAS directions in Slicer, so they are being reorganized as part of that process. I believe it's because the data gets reshuffled in order to make the IJKToRAS transform right handed.

thewtex · 2024-08-29T20:43:04Z

Here's what I get if I export the segmentation as a labelmap (appears to match the CT exactly).

Very helpful, thanks, Steve!

When we read the CT in this notebook, it looks consistent with Slicer. But the SEG is not consistent. As IJK->RAS is consistent between the CT and SEG, we should see a consistent IJK->LPS.

My suspicion is that resampling is not done correctly in itkwidgets.

The bounding box of the volumes, which is independent of any resampling, is showing that there are issues independent of resampling.

Can you tell me what OS you are on?

Linux 🐧

[DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

I wonder if we can capture the NRRD here? We could compare the header to a NRRD generated by itkwasm_image_io from the read SEG.

fedorov · 2024-08-30T02:39:17Z

[DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

I wonder if we can capture the NRRD here? We could compare the header to a NRRD generated by itkwasm_image_io from the read SEG.

You can do that step running segimage2itkimage --inputDICOM [DICOM SEG] --outputDirectory . --mergeSegments (this is exactly what is happening behind the scenes in QuantiativeReporting: https://github.com/QIICR/QuantitativeReporting/blob/master/DICOMPlugins/DICOMSegmentationPlugin.py#L104-L119). This is what I see for the SEG in question in the NRRD header:

sizes: 512 512 109
space directions: (0.66015599999999997,0,0) (0,-0.66015599999999997,0) (0,0,-2.5)
kinds: domain domain domain
endian: little
encoding: gzip
space origin: (-177.300003,168.33972199999999,3.8099975599999998)

Compared to what I see in Slicer when I load the resulting NRRD as a volume.

fedorov · 2024-08-30T15:53:16Z

'm pretty sure that the binary labelmap representation of the segmention will remain the 'primary' since that's what the SEG would be input as and won't be altered by export unless you specifically ask for it.

@pieper from the above, comparing volume metadata between what you exported from Segmentations and what I loaded directly from SEG NRRD, clearly there is a resampling to reference that is happening (509 slices vs 510). Just wanted to bring your attention to this.

pieper · 2024-08-30T17:02:57Z

is a resampling to reference that is happening

That is interesting - yes, if I load the SEG alone and then convert to a binary labelmap it has 109 slices Axial SI so there is a resampling going on when there's a reference volume.

thewtex requested a review from bnmajor August 23, 2024 20:13

thewtex requested review from aylward and jcfr August 23, 2024 20:13

thewtex changed the title ~~IDC DICOM use itkwasm-dicom~~ IDC DICOM use itkwasm-dicom, s5cmd python package Aug 23, 2024

thewtex mentioned this pull request Aug 23, 2024

Update integration and test dicom package with itkWidgets InsightSoftwareConsortium/ITK-Wasm#1205

Open

thewtex changed the title ~~IDC DICOM use itkwasm-dicom, s5cmd python package~~ IDC DICOM use itkwasm-dicom, idc python package Aug 28, 2024

WIP: DOC: Update IDC Seg Primer example to use itkwasm-dicom

b951799

Simplify by using itkwasm-dicom instead of pydicom-seg. Also use the s5cmd Python package.

thewtex force-pushed the dicom-idc-update branch from c38adef to b951799 Compare August 28, 2024 22:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDC DICOM use itkwasm-dicom, idc python package #757

IDC DICOM use itkwasm-dicom, idc python package #757

thewtex commented Aug 23, 2024

review-notebook-app bot commented Aug 23, 2024

thewtex commented Aug 23, 2024

thewtex commented Aug 23, 2024

fedorov commented Aug 23, 2024

fedorov commented Aug 23, 2024 •

edited

Loading

jadh4v commented Aug 24, 2024

pieper commented Aug 24, 2024

fedorov commented Aug 26, 2024 •

edited

Loading

thewtex commented Aug 28, 2024 •

edited

Loading

pieper commented Aug 28, 2024

fedorov commented Aug 29, 2024

pieper commented Aug 29, 2024

thewtex commented Aug 29, 2024

thewtex commented Aug 29, 2024

thewtex commented Aug 29, 2024

fedorov commented Aug 29, 2024

pieper commented Aug 29, 2024

fedorov commented Aug 29, 2024

fedorov commented Aug 29, 2024

pieper commented Aug 29, 2024

fedorov commented Aug 29, 2024 •

edited

Loading

pieper commented Aug 29, 2024

thewtex commented Aug 29, 2024

fedorov commented Aug 30, 2024 •

edited

Loading

fedorov commented Aug 30, 2024 •

edited

Loading

pieper commented Aug 30, 2024

IDC DICOM use itkwasm-dicom, idc python package #757

Are you sure you want to change the base?

IDC DICOM use itkwasm-dicom, idc python package #757

Conversation

thewtex commented Aug 23, 2024

review-notebook-app bot commented Aug 23, 2024

thewtex commented Aug 23, 2024

thewtex commented Aug 23, 2024

fedorov commented Aug 23, 2024

fedorov commented Aug 23, 2024 • edited Loading

jadh4v commented Aug 24, 2024

pieper commented Aug 24, 2024

fedorov commented Aug 26, 2024 • edited Loading

thewtex commented Aug 28, 2024 • edited Loading

pieper commented Aug 28, 2024

fedorov commented Aug 29, 2024

pieper commented Aug 29, 2024

thewtex commented Aug 29, 2024

thewtex commented Aug 29, 2024

thewtex commented Aug 29, 2024

fedorov commented Aug 29, 2024

pieper commented Aug 29, 2024

fedorov commented Aug 29, 2024

fedorov commented Aug 29, 2024

pieper commented Aug 29, 2024

fedorov commented Aug 29, 2024 • edited Loading

pieper commented Aug 29, 2024

thewtex commented Aug 29, 2024

fedorov commented Aug 30, 2024 • edited Loading

fedorov commented Aug 30, 2024 • edited Loading

pieper commented Aug 30, 2024

fedorov commented Aug 23, 2024 •

edited

Loading

fedorov commented Aug 26, 2024 •

edited

Loading

thewtex commented Aug 28, 2024 •

edited

Loading

fedorov commented Aug 29, 2024 •

edited

Loading

fedorov commented Aug 30, 2024 •

edited

Loading

fedorov commented Aug 30, 2024 •

edited

Loading