-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training terminates after the first epoch due to excessive RAM usage #13
Comments
Hi, We've seen this problems before, and usually it was a problem with a too high number of workers. |
Hi, Changing the number of workers didn't seem to help. To analyse this further I commented out the colorizer part in def make_log_image(self, pred, target):
# colorize and put in format
pred = pred.cpu().numpy().argmax(0)
#^MemoryError here^
target = target.cpu().numpy()
output = np.concatenate((pred, target), axis=1)
return output Setting |
Also, regarding creating a custom dataset (didn't want to open an issue as it's just a question). I have a set of images consisting of a camera image and semantic segmentation ground truth of that image, but no json annotation file, so I was wondering (sorry if it's a stupid question) what is the purpose of remapping the labels with Would you suggest looking into generating a json file before using |
Hi, The whole reason for doing it in monochrome is that I need a way to parse the labels that is more or less standard for all datasets, since the idea is to try to be as general as possible. Therefore, you don't need the csv, OR to change to monochrome, as long as your parser knows how to get your pixel-wise labels and turn them into tensors for the training, which are expected to contain per-pixel a value between 0 and Nclasses-1 |
I am trying to train a semantic segmentation model from scratch using COCO dataset, and every time I try to run the training script, it is
Killed
at the validation step after epoch 0.At first, I got
RuntimeError: Dataloader worker (pid xxxx) is killed by signal: Killed
. After looking online, I tried settingnumber of workers
to 0, which caused a similar error at the same stage, but the message just saysKilled
. Looking at the memory usage, just before the process was killed, the RAM usage went all the way up to 97%. I have 64Gb of RAM, which is enough to fit the entire training set if needed, so I don't really understand where the issue originates.I have attached two screenshots showing the errors. The first one suggests that it failed when trying to colourise the images with
colorizer.py
.Could you suggest a workaround? I am hoping to train a model on COCO data to understand how it works, and then train it on my own data which I will format to be COCO-like.
The text was updated successfully, but these errors were encountered: