-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add str method to datamodule #20301
base: master
Are you sure you want to change the base?
Add str method to datamodule #20301
Conversation
Will add documentation once code is fixed |
I have been waiting for quite some time. How do I go on? Do I just wait some more until someone looks at this? |
hey @MrWhatZitToYaa thanks for taking this up and sorry for the wait can you add a bit more detail on what do we think it's good printing from a datamodule? the original design in #9947 was a bit different, as was #9967, but happy to consider alternatives after there's some discussion on the rationale |
The idea originally came to me when I had to to use the Datamodule to implement some datasets we have in our lab. It was really annoying to deal with the fact that you never really know what you are dealing with and makes debugging really hard. I talked to some colleagues and they agreed that they ran into the same problem. That's why I would like to add a str functionality for the Datamodule: primarily for debugging / logging. |
First implementation scetch
Added alternative Boring Data Module implementations Added test cases for all possible options Added additional check for NotImplementedError in string function of DataModule
Made changes to comply with requested suggestions Switched from hardcoded \n to more general os.linesep
0c57055
to
122cf6d
Compare
Corrected the annotation for the internal function and the list that is suppsoed to store the information on the datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, I aligned it with the latest master and added a comment for a fix before we can merge. It should be quick.
attr = getattr(self, attr_name) | ||
|
||
# Get Dataset information | ||
if isinstance(attr, Dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that there are no guarantees that any datasets actually exist as attributes.
I think a better design would be to leverage the contract, and get dataloaders through self.train_dataloader
(caveat: it may be a dataloader or an iterable of dataloaders), self.val_dataloader
, etc, access their dataset
property (https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) and produce a string from there in a similar way to the current way.
Also instead of ignoring non-Dataset subclasses in this case, we could call str from whatever is returned as dataset.
I think it's a valuable addition, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea. I also had a similar idea in the beginning but discarded it because I thought it would add too much overhead. But I did some research an DataLoader should not add too much overhead to the actual dataset so we can use it in the str method.
Your suggestion would make the method more save, while still maintaining the same functionality, since one has to call prepare and setup anyway before one is able to access the dataset information.
I will try to implement your suggestion
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20301 +/- ##
=========================================
- Coverage 88% 79% -9%
=========================================
Files 267 264 -3
Lines 23266 23245 -21
=========================================
- Hits 20375 18297 -2078
- Misses 2891 4948 +2057
|
Fixed type annotation issue Reduced code size by using Sized object from abc library
for more information, see https://pre-commit.ci
Switched from Dataset based implementation to Dataloader based implementation
What does this PR do?
Added a str function to the DataModule in order to be able to print Datasets included in the module instead of an adress.
Fixes #9947
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--20301.org.readthedocs.build/en/20301/