You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/kaggle/working/program_ml/rvc/train/train.py", line 509, in run
train_and_evaluate(
File "/kaggle/working/program_ml/rvc/train/train.py", line 707, in train_and_evaluate
scaler.scale(loss_disc).backward()
File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/autograd/init.py", line 267, in backward
_engine_run_backward(
File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [172.19.2.2]:48294
/opt/conda/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 42 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Steps to reproduce
Happens during training between 100 ~ 500 epochs
Expected behavior
Continue the training without this error
Attachments
No response
Screenshots or Videos
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered:
Project Version
Latest
Platform and OS Version
Kaggle
Affected Devices
Kaggle Latest Environment
Existing Issues
No response
What happened?
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/kaggle/working/program_ml/rvc/train/train.py", line 509, in run
train_and_evaluate(
File "/kaggle/working/program_ml/rvc/train/train.py", line 707, in train_and_evaluate
scaler.scale(loss_disc).backward()
File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/autograd/init.py", line 267, in backward
_engine_run_backward(
File "/kaggle/tmp/.venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [172.19.2.2]:48294
/opt/conda/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 42 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Steps to reproduce
Happens during training between 100 ~ 500 epochs
Expected behavior
Continue the training without this error
Attachments
No response
Screenshots or Videos
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: