Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on recomplement the result of Ablation study #2

Closed
Anthony-boop opened this issue Sep 7, 2019 · 13 comments
Closed

Questions on recomplement the result of Ablation study #2

Anthony-boop opened this issue Sep 7, 2019 · 13 comments

Comments

@Anthony-boop
Copy link

Hi, I am very interested your work, but I have some problem when recomplementing the results in the paper.
image
I runs the "Basic" method in following script.
python train.py $TRAIN_SET \ --dispnet DispNet \ --num-scales 1 \ -b4 -s0.1 -c0.0 --epoch-size 1000 --sequence-length 3 \ --with-mask False \ --with-ssim False \ --name posenet_256
I run the "Basic+SSIM" in the following script.
python train.py $TRAIN_SET \ --dispnet DispNet \ --num-scales 1 \ -b4 -s0.1 -c0.0 --epoch-size 1000 --sequence-length 3 \ --with-mask False \ --with-ssim True \ --name posenet_256
But the result is very terrible.
Could you tell me how you run the basic method?
Thanks.

@Anthony-boop
Copy link
Author

I have got the similar result when I run the "Basic+SSIM+GC+M" in your given script
python train.py $TRAIN_SET \ --dispnet DispNet \ --num-scales 1 \ -b4 -s0.1 -c0.5 --epoch-size 1000 --sequence-length 3 \ --with-mask True \ --with-ssim True \ --name posenet_256

@JiawangBian
Copy link
Owner

I guess that it is because of "with-gt" option. It means you use gt-depth or photometric error for validation, i.e., select the best model for testing.

First, you will see Basic and Basic+SSIM will overfit after about 50 epoches (see the validation error graph in appendix). If you use photometric loss to chose the mode, it would save modes at about 200 epoches becuase the error is lower with training. In this case, you will get terrible results. If you use gt-depth for validation, you will save models at about 50 epoches. They are as good as reported in table.

Then, our proposed GC can effectively avoid overfitting issue (see the validation error), so using gt-depth (my table) or photometric error (your results) both can lead to good results.

@Anthony-boop
Copy link
Author

Thanks. You mean that adding "with-gt" option to select the best model, not by the photometric error. I will try soon.

@JiawangBian
Copy link
Owner

Yes, but this is only available on kitti raw dataset, where the gt-depth is saved. When you run on cityscapes or kitti_odometry dataset, you still need use the photometric error.

@MinZhangm
Copy link

I guess that it is because of "with-gt" option. It means you use gt-depth or photometric error for validation, i.e., select the best model for testing.

First, you will see Basic and Basic+SSIM will overfit after about 50 epoches (see the validation error graph in appendix). If you use photometric loss to chose the mode, it would save modes at about 200 epoches becuase the error is lower with training. In this case, you will get terrible results. If you use gt-depth for validation, you will save models at about 50 epoches. They are as good as reported in table.

Then, our proposed GC can effectively avoid overfitting issue (see the validation error), so using gt-depth (my table) or photometric error (your results) both can lead to good results.

Hello, actually in your code, every epoch will save a model state dict. I test not only using the selected best-model(with gt), but also the others. The result is still terrible, is there any possible reason can cause this problem?

@JiawangBian
Copy link
Owner

I guess that it is because of "with-gt" option. It means you use gt-depth or photometric error for validation, i.e., select the best model for testing.
First, you will see Basic and Basic+SSIM will overfit after about 50 epoches (see the validation error graph in appendix). If you use photometric loss to chose the mode, it would save modes at about 200 epoches becuase the error is lower with training. In this case, you will get terrible results. If you use gt-depth for validation, you will save models at about 50 epoches. They are as good as reported in table.
Then, our proposed GC can effectively avoid overfitting issue (see the validation error), so using gt-depth (my table) or photometric error (your results) both can lead to good results.

Hello, actually in your code, every epoch will save a model state dict. I test not only using the selected best-model(with gt), but also the others. The result is still terrible, is there any possible reason can cause this problem?

First, only the final model (at 200 epoches) and the best model will be saved. Models at other eoches were overwritted by the final model. If not using GC, it would be an overfitted model.

Seocnd, you can see the validation error (absrel) with gt-depth using tensorboard. It shows the error at each epoch, which is very similar with test error, because the metric is same. Can you attach that figure and the test error of the best model ?

@MinZhangm
Copy link

MinZhangm commented Sep 8, 2019

Sorry, i forget to mention that in order to figure out the reason to cause the terrible result, i modify the code and name the model's state dict in epoch number, so i can get the state dict of every epoch. Besides, i find that the reason is not because of the args.with_gt. When the weight of the smoothness loss(args.s) is set to 0.1 at the beginning of the training, the net will not converge that cause the terrible result.
args.s = 0.1, the tensorboard show the performance as the following:
a1 a2 a3 abs sql and so on is unchanged, the shape of the curve is the same as the figure:
image
show the a1 as the example
image

smooth loss is 0

args.s =0, the net can converge, the first 45 epoch figure:
image

a1 as example

so, at the beginning of the training u also set the args.s to 0 ?

@JiawangBian
Copy link
Owner

Sorry, i forget to mention that in order to figure out the reason to cause the terrible result, i modify the code and name the model's state dict in epoch number, so i can get the state dict of every epoch. Besides, i find that the reason is not because of the args.with_gt. When the weight of the smoothness loss(args.s) is set to 0.1 at the beginning of the training, the net will not converge that cause the terrible result.
args.s = 0.1, the tensorboard show the performance as the following:
a1 a2 a3 abs sql and so on is unchanged, the shape of the curve is the same as the figure:
image
show the a1 as the example
image

smooth loss is 0

args.s =0, the net can converge, the first 45 epoch figure:
image

a1 as example

so, at the beginning of the training u also set the args.s to 0 ?

Thanks for you plot. This is a bug, and the discussion on "with-gt" option is another problem. I also met this bug before, but I cannot give a very good solution because it occurs randomly. It may be caused by device environment, so you will see that others using the dafult parameters can run well. However, I suggest you running it on other devices, or re-installing some libraries, or just slightly changing the parameters, which also helps.

@MinZhangm
Copy link

Thanks a lot for your reply.

@liujiaheng
Copy link

@Olivemm I also meet this problem if I do not use the "GC" loss. My setting is CUDA 10, Pytorch 1.0.1, Python 3.6. I am looking forward to your reply If you have some solutions to this problem.

@SeokjuLee
Copy link

@Olivemm @liujiaheng @JiawangBian Same issue here. My setting is CUDA 8.0, Pytorch 1.0.0, and Python 3.7.4.

@liujiaheng
Copy link

@JiawangBian
imageI visualized the result of DispResNet and DispNet, I reimplement the depth result of VGG depth network, but I find the result is still terrible when I change the depth network to DispResNet, just like the depth map shown above(First is the result of DispResNet, and second is the result of DispNet). Can you analyze the reason for this problem?
Thanks, looking forward to your reply.

@834810269
Copy link

@Olivemm @liujiaheng @JiawangBian @SeokjuLee I also met this problem that smoothness loss equals 0 when i use mutil-scale to train.But when i train for one scale ,this problem will hardly appear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants