Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for resume training from network pkl in run_training #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tripzero
Copy link

@tripzero tripzero commented Feb 6, 2020

No description provided.

@woctezuma
Copy link

woctezuma commented Feb 14, 2020

If this is what I think it is, I wish the pull request was accepted.

However, I think the reason why it might not be accepted is that the training schedule and the reporting are affected by two other variables, which should be provided by the user when the training is resumed:

resume_pkl  = None,     # Network pickle to resume training from, None = train from scratch.
resume_kimg = 0.0,      # Assumed training progress at the beginning. Affects reporting and training schedule.
resume_time = 0.0,      # Assumed wallclock time at the beginning. Affects reporting.

Reference: here.

Copy link

@vsemecky vsemecky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. This is exactly what I missed.

@ayush9198gupta
Copy link

ayush9198gupta commented Apr 20, 2020

Hi,
I am using and implementing stylegan2 model in google colab (Pro version) wherein i have loaded pretrained network file (stylegan2-ffhq-config-f.pkl) but when i am executing the code it automatically increases my colab RAM and session get restarted so please help me with the solution for this issue.

sharing my Colab setting which i am following now:
Hardware accelerator - GPU
Runtime shape - HIGH-RAM

Second issue is i have loaded lastet pickle file of stylegan model into stylegan2 model with transfer learning with after execution it is not saving the results.I have meade following changes in training_loop.py file.

#load the latest pickle file generated from stylegan model:
resume_pkl = './results/00003-stylegan2-anime_images-1gpu-config-b/network-snapshot-012672.pkl',

#To save the output after every epcoh , here i made changes :
image_snapshot_ticks = 1,
network_snapshot_ticks = 1,

Kindly revert back for this two issues.
Thanks

@ahmedshingaly
Copy link

CUDA_ERROR_OUT_OF_MEMORY
i have one GPU 2080 Ti yet I get above error
how can I train using one GPU and where to reduce batch size?

@obravo7
Copy link

obravo7 commented Jun 5, 2020

@ahmedshingaly
You might be out of luck. Are you trying to train config-f at full resolution (1024x1024)? If so the CUDA_ERROR_OUT_OF_MEMORY is to be expected. The authors state that

Note that training FFHQ at 1024×1024 resolution requires GPU(s) with at least 16 GB of memory.

and I believe a 2080 Ti has 12 GB of memory.

@ahmedshingaly
Copy link

@ahmedshingaly
You might be out of luck. Are you trying to train config-f at full resolution (1024x1024)? If so the CUDA_ERROR_OUT_OF_MEMORY is to be expected. The authors state that

Note that training FFHQ at 1024×1024 resolution requires GPU(s) with at least 16 GB of memory.

and I believe a 2080 Ti has 12 GB of memory.

you are right, I am using a custom dataset, and still error persists. will try to run on google collab and see if it gives a different result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants