Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDC DICOM use itkwasm-dicom, idc python package #757

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

thewtex
Copy link
Member

@thewtex thewtex commented Aug 23, 2024

No description provided.

@thewtex thewtex requested a review from bnmajor August 23, 2024 20:13
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@thewtex thewtex requested review from aylward and jcfr August 23, 2024 20:13
@thewtex thewtex changed the title IDC DICOM use itkwasm-dicom IDC DICOM use itkwasm-dicom, s5cmd python package Aug 23, 2024
@thewtex
Copy link
Member Author

thewtex commented Aug 23, 2024

@jadh4v can you please take a look? We are getting slight differences with the origin and direction signs between the CT and SEG. @fedorov does this ring any bells?

@thewtex
Copy link
Member Author

thewtex commented Aug 23, 2024

@fedorov
Copy link
Member

fedorov commented Aug 23, 2024

To make this easier to review, if I understand correctly from https://app.reviewnb.com/InsightSoftwareConsortium/itkwidgets/pull/757/, this is SEG:

origin=[-177.300003, 168.339722, 3.80999756],
    spacing=[0.660156, 0.660156, 2.5],
    direction=array([[ 1.,  0.,  0.],
       [ 0., -1.,  0.],
       [ 0.,  0., -1.]]),
    size=[512, 512, 109],

and this is CT:

    Size: [512, 512, 110]
  Spacing: [0.660156, 0.660156, 2.5]
  Origin: [-177.3, -169, -268.69]
  Direction: 
1 0 0
0 1 0
0 0 1

I think there are multiple issues that are/may be involved.

  1. When segmentation is created from a DICOM image, there are quite a few transformations/recalculations that take place to build 3d volume from DICOM slices. This can lead to loss of precision, which can accumulate over the volume, which does not preserve locations per-slice. Here's a relevant issue where I thought we did the best we could to reduce loss of precision: BUG: serialize floats with maximum precision QIICR/dcmqi#416
  2. we did ran into the issue where, again, due to rounding, the number of slices can end up being different between CT and SEG. I thought we fixed this "off by 1" in Revisit calculation of the number of slices for SEG conversion QIICR/dcmqi#275.
  3. the number of slices in SEG absolutely does not have to match the number of slices in CT if segmentation contains empty frames, since those can be skipped on encoding SEG. This is a common source of confusion. If empty slices are not skipped, you may end up with very large SEG files, since there is no good way to compress those frames for SEG....
  4. it could be that pathways for reading CT and SEG follow different conventions (is the difference in the formatting of the image metadata a sign that even output representation is different between the two?), and load of CT reorients the image buffer. I thought it is commonly done while reading into NIfTI.

My recommendation has been to resample SEG to the image geometry with nearest neighbor interpolator before working with the arrays. It is not pretty, but should be lossless. I am not sure there is a better way.

@pieper do you have any comments?

@fedorov
Copy link
Member

fedorov commented Aug 23, 2024

@thewtex related to this PR, in

!{sys.executable} -m pip install -q pooch itk-io "itkwidgets[all]>=1.0a49" pydicom rich s5cmd itkwasm-dicom numpy

I recommend you instead of s5cmd install idc-index. idc-index will bring s5cmd as a dependency, but will also provide additional functions to simplify interaction with IDC.

For example, the cell below:

%%capture
# CT series downloaded from IDC, NLST collection (https://doi.org/10.7937/TCIA.HMQ8-J677)
!s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com cp "s3://public-datasets-idc/865da32b-cea7-4e6d-b4fe-9731bf08b3d2/*" CT_DICOM_series

# segmentation of this series downloaded from IDC, from nnU-Net-BPR-Annotations (https://doi.org/10.5281/zenodo.7539035)
!s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com cp "s3://public-datasets-idc/3a0c03fe-3d17-48dc-9b8d-f64f5302e0a4/*" SEG_DICOM_series

will become the following:

# CT series downloaded from IDC, NLST collection (https://doi.org/10.7937/TCIA.HMQ8-J677)
!idc download 1.2.840.113654.2.55.247854884634057477137769379611681725965

# segmentation of this series downloaded from IDC, from nnU-Net-BPR-Annotations (https://doi.org/10.5281/zenodo.7539035)
!idc download 1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219

In the above, 865da32b-cea7-4e6d-b4fe-9731bf08b3d2 is IDC crdc_series_uuid (not a DICOM attribute, DICOM does not support object versioning natively), and 1.2.840.113654.2.55.247854884634057477137769379611681725965 is DICOM SeriesInstanceUID.

crdc_series_uuid uniquely identifies the binary files associated with a series, meaning that if the specific DICOM series changes in the future but will keep the same SeriesInstanceUID, the files corresponding to the new version of the series will get assigned a new crdc_series_uuid.

idc download command will pull requested DICOM series defined by SeriesInstanceUID (it can also download content referenced by PatientID, StudyInstanceUID, or collection_id, and will sort the files into hierarchy on download) from the current version of IDC.

We definitely can add the functionality to download by crdc_series_uuid, just didn't have a use case for that. I created an issue in ImagingDataCommons/idc-index#117.

In any case, I strongly recommend idc-index to simplify the code and open the door for other opportunities to work with IDC for the notebook users!

@jadh4v
Copy link
Member

jadh4v commented Aug 24, 2024

@thewtex My first impression is that the coordinate system chosen for SEG is different. I suppose it depends on how the SEG was generated. Also agree with @fedorov regarding precision loss as well as skipping of slices (accounts for dim difference).
Looks like we need a baseline understanding of what to expect from a pair of DICOM + SEG dataset. One way to ensure that is to generate our own data but we'd we living in our own bubble reading+writing our own test data.

Would there be a way to interpret the SEG from any third party software (even if closed source), get it converted and then compare the outputs?

@pieper
Copy link

pieper commented Aug 24, 2024

@pieper do you have any comments?

Lots of good things to discuss here - a main point being that the SEG and CT are essentially two independent entities with their own spacings, dimensions, orientations, etc that are only shown together when they share a frame of reference UID (or there's a spatial registration to map between frames of reference. So we shouldn't, for example, consider the CT geometry when creating the SEG object, but rather create the most faithful representation of the input data.

@fedorov
Copy link
Member

fedorov commented Aug 26, 2024

the SEG and CT are essentially two independent entities with their own spacings, dimensions, orientations, etc that are only shown together when they share a frame of reference UID (or there's a spatial registration to map between frames of reference

I agree with Steve, this is a perfect summary! Also, since I wrote the above, I realized that original segmentations were created in NIfTI format using nnU-Net, which were then converted into DICOM SEG using dcmqi. Change in image orientation most likely happened during NIfTI conversion.

A separate question is why itkWidgets don't seem to be able to correctly render segmentation label that has a different orientation compared to the image? Is this a bug?

@thewtex thewtex changed the title IDC DICOM use itkwasm-dicom, s5cmd python package IDC DICOM use itkwasm-dicom, idc python package Aug 28, 2024
@thewtex
Copy link
Member Author

thewtex commented Aug 28, 2024

I agree with Steve, this is a perfect summary! Also, since I wrote the above, I realized that original segmentations were created in NIfTI format using nnU-Net, which were then converted into DICOM SEG using dcmqi. Change in image orientation most likely happened during NIfTI conversion.

Yes, this does seem to be the source of the problem. What would you like to do? A few ideas:

  • Correct the metadata in the DICOM SEG in IDC (ideally this would happen regardless at some point)
  • Apply a correction in this notebook
  • Use a different example input dataset that does have correct metadata

Lots of good things to discuss here - a main point being that the SEG and CT are essentially two independent entities with their own spacings, dimensions, orientations, etc that are only shown together when they share a frame of reference UID (or there's a spatial registration to map between frames of reference. So we shouldn't, for example, consider the CT geometry when creating the SEG object, but rather create the most faithful representation of the input data.

Yes, we are trying to use the metadata of the CT and SEG as encoded individually in each. But that input metadata appears to have issues.

My recommendation has been to resample SEG to the image geometry with nearest neighbor interpolator before working with the arrays.

This is already the case. In fact, we even use an interpolator with better than nearest-neighbor interpolation for labels, available in a small, universal ITK-Wasm python package :-).

If we apply the CT metadata to the SEG, this is what we get:

image

Note that it is also incorrect -- the heart and esophagus are inverted.

This is what the current metadata is specifying:

image

The empty box is the domain of the SEG -- vtk.js has a limitation of only rendering one volume.

I recommend you instead of s5cmd install idc-index. idc-index will bring s5cmd as a dependency, but will also provide additional functions to simplify interaction with IDC.

Most excellent!! 🎇 Done. 🍾

Simplify by using itkwasm-dicom instead of pydicom-seg.

Also use the s5cmd Python package.
@pieper
Copy link

pieper commented Aug 28, 2024

image

Yes, we are trying to use the metadata of the CT and SEG as encoded individually in each. But that input metadata appears to have issues.

I hadn't realized that, but yes, one of the liver segs here has incorrect geometry. We should get the data fixed and not use this for a demonstration or tutorial (unless we want to warn about bad metadata).

@fedorov
Copy link
Member

fedorov commented Aug 29, 2024

Yes, we are trying to use the metadata of the CT and SEG as encoded individually in each. But that input metadata appears to have issues.

@thewtex I don't see what are the issues with the data - the image is oriented differently, but I am not aware of any issues with the metadata. It works fine in Slicer, which is using dcmqi, screenshot below. You can see this series visualized in OHIF, which is using completely independent DICOM readers: https://viewer.imaging.datacommons.cancer.gov/v3/viewer/?StudyInstanceUIDs=1.2.840.113654.2.55.133032926508606330165457723341736723804&SeriesInstanceUIDs=1.2.840.113654.2.55.247854884634057477137769379611681725965,1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219.

image

This is already the case. In fact, we even use an interpolator with better than nearest-neighbor interpolation for labels, available in a small, universal ITK-Wasm python package :-).

Where is this taking place? I did not see any resampling in the notebook code, unless I missed it.

I hadn't realized that, but yes, one of the liver segs here has incorrect geometry.

@pieper which sample are you referring to? I do not see any liver segmentations in the sample used in the notebook: https://viewer.imaging.datacommons.cancer.gov/v3/viewer/?StudyInstanceUIDs=1.2.840.113654.2.55.133032926508606330165457723341736723804&SeriesInstanceUIDs=1.2.840.113654.2.55.247854884634057477137769379611681725965,1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219.

@pieper
Copy link

pieper commented Aug 29, 2024

Thanks for the link to the actual data @fedorov. The data I was looking at came from a different study, but I don't have the uid handy. So there is a dataset with a bad liver segmentation geometry but it's not the one from this issue. Sorry for the extra confusion.

@thewtex - since Andrey and I confirmed the geometry is read correctly in OHIF and Slicer for StudyInstanceUID 1.2.840.113654.2.55.133032926508606330165457723341736723804 can you show and itk widgets example for it?

The seg and the referenced series uids are:

[0008,1115]	ReferencedSeriesSequence		SQ	12910
	[fffe,e000]	Item		na	12894
		[0020,000e]	SeriesInstanceUID	1.2.840.113654.2.55.247854884634057477137769379611681725965	UI	60
[0020,000e]	SeriesInstanceUID	1.2.276.0.7230010.3.1.3.481034752.249602.1661572065.734219	UI	58

@thewtex
Copy link
Member Author

thewtex commented Aug 29, 2024

@fedorov @pieper interesting!

Here is the metadata I get for the CT when loaded in 3D Slicer:

image

I was not able to load the QuantiativeReporting extension for DCMQI in Slicer -- I get an extension not found with Slicer 5.6.2 and a backtrace when it tries to start with 5.7.0. Do you know the Slicer-loaded metadata for the SEG?

This seems to be different from what we get in this notebook,

    Size: [512, 512, 110]
  Spacing: [0.660156, 0.660156, 2.5]
  Origin: [-177.3, -169, -268.69]
  Direction: 
1 0 0
0 1 0
0 0 1

I suspect there is a difference with how the CT is loaded and interpreted.

StudyInstanceUID 1.2.840.113654.2.55.133032926508606330165457723341736723804 can you show and itk widgets example for it?

Yes, that is what is in this PR.

Where is this taking place? I did not see any resampling in the notebook code, unless I missed it.

This is part of the rendering process.

@thewtex
Copy link
Member Author

thewtex commented Aug 29, 2024

> This seems to be different from what we get in this notebook

Well, we also have LPS / RAS to account for.

@thewtex
Copy link
Member Author

thewtex commented Aug 29, 2024

Do you know the Slicer-loaded metadata for the SEG?

This may be enlightening.

@fedorov
Copy link
Member

fedorov commented Aug 29, 2024

I was not able to load the QuantiativeReporting extension for DCMQI in Slicer -- I get an extension not found with Slicer 5.6.2 and a backtrace when it tries to start with 5.7.0.

Can you tell me what OS you are on? I just uninstalled/installed, and everything works for me on Mac with 5.6.2.

I checked cdash, and I do see an error for stable mac build for today, which seems to be an intermittent failure to clone the repo: https://slicer.cdash.org/builds/3504732/configure.

I also tested on 5.7.0, and did not see any problems.

Do you know the Slicer-loaded metadata for the SEG?

This part is tricky with my knowledge of Slicer.... The issue is that segmentations are not loaded into Slicer scalar volumes, but into "Slicer Segmentations". The only way I know of to see volume-level metadata (spacing, orientation, etc) is to export "Segmentation" into a volume, but that operation takes an image volume as a reference, so the result will not tell us what you want to know... Maybe @pieper or @lassoan can help here? How can we know the metadata for loaded segmentation similar to what Volumes module shows as in #757 (comment)?

@pieper
Copy link

pieper commented Aug 29, 2024

Here's what I get if I export the segmentation as a labelmap (appears to match the CT exactly).

image

@fedorov
Copy link
Member

fedorov commented Aug 29, 2024

My suspicion is that resampling is not done correctly in itkwidgets.

@fedorov
Copy link
Member

fedorov commented Aug 29, 2024

Here's what I get if I export the segmentation as a labelmap (appears to match the CT exactly).

Doesn't that export process take reference volume as reference, and resamples to that reference? It would always match the geometry of the reference then, but it does not reflect geometry "as loaded".

@pieper
Copy link

pieper commented Aug 29, 2024

A couple things

  • the extension server seems to be a bit flakey today. I tried 5.6.2 earlier on linux and got 0 extensions, then restarted Slicer and I was able to install as usual.

  • I'm pretty sure that the binary labelmap representation of the segmention will remain the 'primary' since that's what the SEG would be input as and won't be altered by export unless you specifically ask for it. If you loaded, e.g. an STL, it wouldn't be rasterized until you asked for it, typically in the Segment Editor in the context of a volume to use as the reference grid.

  • In this case the ImageOrientationPatient vectors are different in the SEG vs the CT so probably something in the reading process is converting the SEG to Axial IS.
    image

image

@fedorov
Copy link
Member

fedorov commented Aug 29, 2024

In this case the ImageOrientationPatient vectors are different in the SEG vs the CT so probably something in the reading process is converting the SEG to Axial IS.

No, I don't think it is in the reading process. As I tried to explain earlier, I am pretty certain this is happening in the process of generating the original representation of the segmentation by nnU-Net that is then converted into SEG.

This is the approximate provenance of the SEG we are looking at:

[DICOM CT] --> nnU-Net --> [NIfTI] --> dcmqi --> [DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

There is no way around it - we should just accept that orientations may differ, and make sure resampling is implemented correctly for the systems that don't render in patient space.

@pieper
Copy link

pieper commented Aug 29, 2024

Actually, the pipeline is:

[DICOM CT] --> [NIfTI] --> nnU-Net --> [NIfTI] --> dcmqi --> [DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

And it's possible that the first conversion from dicom to nii introduced a reorganization of the data.

By the "something in the reading process is converting the SEG to Axial IS." what I mean is that the CT and the SEG have different ImageOrientationPatient vectors in the instances, but they end up having the same IJK to RAS directions in Slicer, so they are being reorganized as part of that process. I believe it's because the data gets reshuffled in order to make the IJKToRAS transform right handed.

@thewtex
Copy link
Member Author

thewtex commented Aug 29, 2024

Here's what I get if I export the segmentation as a labelmap (appears to match the CT exactly).

Very helpful, thanks, Steve!

When we read the CT in this notebook, it looks consistent with Slicer. But the SEG is not consistent. As IJK->RAS is consistent between the CT and SEG, we should see a consistent IJK->LPS.

My suspicion is that resampling is not done correctly in itkwidgets.

The bounding box of the volumes, which is independent of any resampling, is showing that there are issues independent of resampling.

Can you tell me what OS you are on?

Linux 🐧

[DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

I wonder if we can capture the NRRD here? We could compare the header to a NRRD generated by itkwasm_image_io from the read SEG.

@fedorov
Copy link
Member

fedorov commented Aug 30, 2024

[DICOM SEG] --> dcmqi --> [NRRD] --> Slicer

I wonder if we can capture the NRRD here? We could compare the header to a NRRD generated by itkwasm_image_io from the read SEG.

You can do that step running segimage2itkimage --inputDICOM [DICOM SEG] --outputDirectory . --mergeSegments (this is exactly what is happening behind the scenes in QuantiativeReporting: https://github.com/QIICR/QuantitativeReporting/blob/master/DICOMPlugins/DICOMSegmentationPlugin.py#L104-L119). This is what I see for the SEG in question in the NRRD header:

sizes: 512 512 109
space directions: (0.66015599999999997,0,0) (0,-0.66015599999999997,0) (0,0,-2.5)
kinds: domain domain domain
endian: little
encoding: gzip
space origin: (-177.300003,168.33972199999999,3.8099975599999998)

Compared to what I see in Slicer when I load the resulting NRRD as a volume.

image

@fedorov
Copy link
Member

fedorov commented Aug 30, 2024

'm pretty sure that the binary labelmap representation of the segmention will remain the 'primary' since that's what the SEG would be input as and won't be altered by export unless you specifically ask for it.

@pieper from the above, comparing volume metadata between what you exported from Segmentations and what I loaded directly from SEG NRRD, clearly there is a resampling to reference that is happening (509 slices vs 510). Just wanted to bring your attention to this.

@pieper
Copy link

pieper commented Aug 30, 2024

is a resampling to reference that is happening

That is interesting - yes, if I load the SEG alone and then convert to a binary labelmap it has 109 slices Axial SI so there is a resampling going on when there's a reference volume.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants