Questions on recomplement the result of Ablation study #2

Anthony-boop · 2019-09-07T02:59:49Z

Hi, I am very interested your work, but I have some problem when recomplementing the results in the paper.

I runs the "Basic" method in following script.
python train.py $TRAIN_SET \ --dispnet DispNet \ --num-scales 1 \ -b4 -s0.1 -c0.0 --epoch-size 1000 --sequence-length 3 \ --with-mask False \ --with-ssim False \ --name posenet_256
I run the "Basic+SSIM" in the following script.
python train.py $TRAIN_SET \ --dispnet DispNet \ --num-scales 1 \ -b4 -s0.1 -c0.0 --epoch-size 1000 --sequence-length 3 \ --with-mask False \ --with-ssim True \ --name posenet_256
But the result is very terrible.
Could you tell me how you run the basic method?
Thanks.

The text was updated successfully, but these errors were encountered:

Anthony-boop · 2019-09-07T03:08:05Z

I have got the similar result when I run the "Basic+SSIM+GC+M" in your given script
python train.py $TRAIN_SET \ --dispnet DispNet \ --num-scales 1 \ -b4 -s0.1 -c0.5 --epoch-size 1000 --sequence-length 3 \ --with-mask True \ --with-ssim True \ --name posenet_256

JiawangBian · 2019-09-07T13:40:37Z

I guess that it is because of "with-gt" option. It means you use gt-depth or photometric error for validation, i.e., select the best model for testing.

First, you will see Basic and Basic+SSIM will overfit after about 50 epoches (see the validation error graph in appendix). If you use photometric loss to chose the mode, it would save modes at about 200 epoches becuase the error is lower with training. In this case, you will get terrible results. If you use gt-depth for validation, you will save models at about 50 epoches. They are as good as reported in table.

Then, our proposed GC can effectively avoid overfitting issue (see the validation error), so using gt-depth (my table) or photometric error (your results) both can lead to good results.

Anthony-boop · 2019-09-07T14:09:55Z

Thanks. You mean that adding "with-gt" option to select the best model, not by the photometric error. I will try soon.

JiawangBian · 2019-09-07T14:18:12Z

Yes, but this is only available on kitti raw dataset, where the gt-depth is saved. When you run on cityscapes or kitti_odometry dataset, you still need use the photometric error.

MinZhangm · 2019-09-08T13:12:30Z

I guess that it is because of "with-gt" option. It means you use gt-depth or photometric error for validation, i.e., select the best model for testing.

First, you will see Basic and Basic+SSIM will overfit after about 50 epoches (see the validation error graph in appendix). If you use photometric loss to chose the mode, it would save modes at about 200 epoches becuase the error is lower with training. In this case, you will get terrible results. If you use gt-depth for validation, you will save models at about 50 epoches. They are as good as reported in table.

Then, our proposed GC can effectively avoid overfitting issue (see the validation error), so using gt-depth (my table) or photometric error (your results) both can lead to good results.

Hello, actually in your code, every epoch will save a model state dict. I test not only using the selected best-model(with gt), but also the others. The result is still terrible, is there any possible reason can cause this problem?

JiawangBian · 2019-09-08T13:25:19Z

I guess that it is because of "with-gt" option. It means you use gt-depth or photometric error for validation, i.e., select the best model for testing.
First, you will see Basic and Basic+SSIM will overfit after about 50 epoches (see the validation error graph in appendix). If you use photometric loss to chose the mode, it would save modes at about 200 epoches becuase the error is lower with training. In this case, you will get terrible results. If you use gt-depth for validation, you will save models at about 50 epoches. They are as good as reported in table.
Then, our proposed GC can effectively avoid overfitting issue (see the validation error), so using gt-depth (my table) or photometric error (your results) both can lead to good results.

Hello, actually in your code, every epoch will save a model state dict. I test not only using the selected best-model(with gt), but also the others. The result is still terrible, is there any possible reason can cause this problem?

First, only the final model (at 200 epoches) and the best model will be saved. Models at other eoches were overwritted by the final model. If not using GC, it would be an overfitted model.

Seocnd, you can see the validation error (absrel) with gt-depth using tensorboard. It shows the error at each epoch, which is very similar with test error, because the metric is same. Can you attach that figure and the test error of the best model ?

MinZhangm · 2019-09-08T23:55:29Z

Sorry, i forget to mention that in order to figure out the reason to cause the terrible result, i modify the code and name the model's state dict in epoch number, so i can get the state dict of every epoch. Besides, i find that the reason is not because of the args.with_gt. When the weight of the smoothness loss(args.s) is set to 0.1 at the beginning of the training, the net will not converge that cause the terrible result.
args.s = 0.1, the tensorboard show the performance as the following:
a1 a2 a3 abs sql and so on is unchanged, the shape of the curve is the same as the figure:

show the a1 as the example

smooth loss is 0

args.s =0, the net can converge, the first 45 epoch figure:

a1 as example

so, at the beginning of the training u also set the args.s to 0 ?

JiawangBian · 2019-09-09T03:21:17Z

Sorry, i forget to mention that in order to figure out the reason to cause the terrible result, i modify the code and name the model's state dict in epoch number, so i can get the state dict of every epoch. Besides, i find that the reason is not because of the args.with_gt. When the weight of the smoothness loss(args.s) is set to 0.1 at the beginning of the training, the net will not converge that cause the terrible result.
args.s = 0.1, the tensorboard show the performance as the following:
a1 a2 a3 abs sql and so on is unchanged, the shape of the curve is the same as the figure:

show the a1 as the example

smooth loss is 0

args.s =0, the net can converge, the first 45 epoch figure:

a1 as example

so, at the beginning of the training u also set the args.s to 0 ?

Thanks for you plot. This is a bug, and the discussion on "with-gt" option is another problem. I also met this bug before, but I cannot give a very good solution because it occurs randomly. It may be caused by device environment, so you will see that others using the dafult parameters can run well. However, I suggest you running it on other devices, or re-installing some libraries, or just slightly changing the parameters, which also helps.

MinZhangm · 2019-09-09T03:58:32Z

Thanks a lot for your reply.

liujiaheng · 2019-09-09T04:49:01Z

@Olivemm I also meet this problem if I do not use the "GC" loss. My setting is CUDA 10, Pytorch 1.0.1, Python 3.6. I am looking forward to your reply If you have some solutions to this problem.

SeokjuLee · 2019-09-17T12:05:40Z

@Olivemm @liujiaheng @JiawangBian Same issue here. My setting is CUDA 8.0, Pytorch 1.0.0, and Python 3.7.4.

liujiaheng · 2019-09-24T09:06:58Z

@JiawangBian
I visualized the result of DispResNet and DispNet, I reimplement the depth result of VGG depth network, but I find the result is still terrible when I change the depth network to DispResNet, just like the depth map shown above(First is the result of DispResNet, and second is the result of DispNet). Can you analyze the reason for this problem?
Thanks, looking forward to your reply.

834810269 · 2019-10-17T03:48:20Z

@Olivemm @liujiaheng @JiawangBian @SeokjuLee I also met this problem that smoothness loss equals 0 when i use mutil-scale to train.But when i train for one scale ,this problem will hardly appear

JiawangBian mentioned this issue Sep 29, 2019

Question about using ground truth for validation #16

Open

UltronAI mentioned this issue Nov 11, 2019

Trouble on pre-training models on Cityscapes #26

Closed

cszer mentioned this issue Jun 22, 2020

Training with my own data #48

Closed

SenZHANG-GitHub mentioned this issue Jul 20, 2020

Trouble on Parallel Training #55

Closed

JiawangBian pinned this issue Jul 31, 2020

suvigy mentioned this issue Aug 27, 2020

Issue on data parallel training #60

Closed

JiawangBian closed this as completed Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on recomplement the result of Ablation study #2

Questions on recomplement the result of Ablation study #2

Anthony-boop commented Sep 7, 2019

Anthony-boop commented Sep 7, 2019

JiawangBian commented Sep 7, 2019

Anthony-boop commented Sep 7, 2019

JiawangBian commented Sep 7, 2019

MinZhangm commented Sep 8, 2019

JiawangBian commented Sep 8, 2019

MinZhangm commented Sep 8, 2019 •

edited

Loading

JiawangBian commented Sep 9, 2019

MinZhangm commented Sep 9, 2019

liujiaheng commented Sep 9, 2019

SeokjuLee commented Sep 17, 2019

liujiaheng commented Sep 24, 2019

834810269 commented Oct 17, 2019

Questions on recomplement the result of Ablation study #2

Questions on recomplement the result of Ablation study #2

Comments

Anthony-boop commented Sep 7, 2019

Anthony-boop commented Sep 7, 2019

JiawangBian commented Sep 7, 2019

Anthony-boop commented Sep 7, 2019

JiawangBian commented Sep 7, 2019

MinZhangm commented Sep 8, 2019

JiawangBian commented Sep 8, 2019

MinZhangm commented Sep 8, 2019 • edited Loading

JiawangBian commented Sep 9, 2019

MinZhangm commented Sep 9, 2019

liujiaheng commented Sep 9, 2019

SeokjuLee commented Sep 17, 2019

liujiaheng commented Sep 24, 2019

834810269 commented Oct 17, 2019

MinZhangm commented Sep 8, 2019 •

edited

Loading