Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTQ calibration shows bad results. #375

Open
taestaes opened this issue Nov 27, 2024 · 32 comments
Open

PTQ calibration shows bad results. #375

taestaes opened this issue Nov 27, 2024 · 32 comments
Labels
question Further information is requested

Comments

@taestaes
Copy link

I followed the https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/quantization/post_error_anaylsis.py

calibrated with 10 iterations on test dataloader.

But my network PTQ outputs really bad results as below.

why the following layers have really low cosine similarity?

float_functional_simple_8                          cosine: 0.4407, scale: 0.0016, zero_point: 133
output                                             cosine: 0.6237, scale: 0.0004, zero_point: 154

and found the result image have values of multiple of 0.004 (maybe this is because 8 bit?)

WARNING (tinynn.util.quantization_analysis_util) Quantization error report:

Activations (cosine sorted ):
fake_quant_0                                       cosine: 1.0000, scale: 0.0043, zero_point: 5
float_functional_simple_0                          cosine: 1.0000, scale: 0.0043, zero_point: 5
patch_embed_proj                                   cosine: 0.9980, scale: 0.0133, zero_point: 125
body_0_ffn_project_in                              cosine: 0.9858, scale: 0.0177, zero_point: 126
body_0_ffn_dwconv                                  cosine: 0.9823, scale: 0.0127, zero_point: 124
body_0_ffn_act.quant                               cosine: 1.0000, scale: 0.0005, zero_point: 255
body_0_ffn_act.f_mul_alpha                         cosine: 0.9720, scale: 0.0008, zero_point: 0
body_0_ffn_act.f_add                               cosine: 0.9801, scale: 0.0051, zero_point: 0
body_0_ffn_project_out                             cosine: 0.9130, scale: 0.0134, zero_point: 128
float_functional_simple_2                          cosine: 0.9400, scale: 0.0142, zero_point: 139
body_1_ffn_project_in                              cosine: 0.9636, scale: 0.0285, zero_point: 123
body_1_ffn_dwconv                                  cosine: 0.9774, scale: 0.0499, zero_point: 188
body_1_ffn_act.quant                               cosine: 1.0000, scale: 0.0000, zero_point: 255
body_1_ffn_act.f_mul_alpha                         cosine: 0.9806, scale: 0.0001, zero_point: 0
body_1_ffn_act.f_add                               cosine: 0.8992, scale: 0.0133, zero_point: 0
body_1_ffn_project_out                             cosine: 0.9002, scale: 0.0131, zero_point: 127
float_functional_simple_4                          cosine: 0.8736, scale: 0.0064, zero_point: 135
body_2_ffn_project_in                              cosine: 0.9458, scale: 0.0125, zero_point: 130
body_2_ffn_dwconv                                  cosine: 0.9597, scale: 0.0170, zero_point: 177
body_2_ffn_act.quant                               cosine: 1.0000, scale: 0.0001, zero_point: 0
body_2_ffn_act.f_mul_alpha                         cosine: 0.9837, scale: 0.0003, zero_point: 238
body_2_ffn_act.f_add                               cosine: 0.9069, scale: 0.0067, zero_point: 49
body_2_ffn_project_out                             cosine: 0.9250, scale: 0.0070, zero_point: 122
float_functional_simple_6                          cosine: 0.9139, scale: 0.0061, zero_point: 122
body_3_ffn_project_in                              cosine: 0.9626, scale: 0.0085, zero_point: 126
body_3_ffn_dwconv                                  cosine: 0.9880, scale: 0.0102, zero_point: 147
body_3_ffn_act.quant                               cosine: 1.0000, scale: 0.0001, zero_point: 255
body_3_ffn_act.f_mul_alpha                         cosine: 0.9933, scale: 0.0001, zero_point: 0
body_3_ffn_act.f_add                               cosine: 0.9638, scale: 0.0048, zero_point: 0
body_3_ffn_project_out                             cosine: 0.9703, scale: 0.0051, zero_point: 148
float_functional_simple_8                          cosine: 0.4407, scale: 0.0016, zero_point: 133
output                                             cosine: 0.6237, scale: 0.0004, zero_point: 154
float_functional_simple_9                          cosine: 0.9894, scale: 0.0040, zero_point: 2

WARNING (tinynn.util.quantization_analysis_util) Quantization error report:

Weights (cosine sorted 20):
output                                   cosine: 0.9996, scale: 0.0029, zero_point: 0
patch_embed_proj                         cosine: 0.9998, scale: 0.0056, zero_point: 0
body_3_ffn_dwconv                        cosine: 0.9999, scale: 0.0113, zero_point: 0
body_1_ffn_dwconv                        cosine: 0.9999, scale: 0.0114, zero_point: 0
body_0_ffn_dwconv                        cosine: 0.9999, scale: 0.0132, zero_point: 0
body_2_ffn_dwconv                        cosine: 0.9999, scale: 0.0110, zero_point: 0
body_0_ffn_project_in                    cosine: 0.9999, scale: 0.0118, zero_point: 0
body_1_ffn_project_in                    cosine: 0.9999, scale: 0.0104, zero_point: 0
body_2_ffn_project_in                    cosine: 0.9999, scale: 0.0094, zero_point: 0
body_3_ffn_project_out                   cosine: 0.9999, scale: 0.0052, zero_point: 0
body_0_ffn_project_out                   cosine: 0.9999, scale: 0.0051, zero_point: 0
body_1_ffn_project_out                   cosine: 0.9999, scale: 0.0058, zero_point: 0
body_3_ffn_project_in                    cosine: 0.9999, scale: 0.0085, zero_point: 0
body_2_ffn_project_out                   cosine: 0.9999, scale: 0.0048, zero_point: 0

Activations (cosine sorted 20):
body_1_ffn_act.f_add                               cosine: 0.9432, scale: 0.0133, zero_point: 0
body_0_ffn_project_in                              cosine: 0.9546, scale: 0.0177, zero_point: 126
body_0_ffn_act.f_mul_alpha                         cosine: 0.9757, scale: 0.0008, zero_point: 0
body_0_ffn_dwconv                                  cosine: 0.9781, scale: 0.0127, zero_point: 124
body_1_ffn_project_in                              cosine: 0.9788, scale: 0.0285, zero_point: 123
body_0_ffn_act.f_add                               cosine: 0.9803, scale: 0.0051, zero_point: 0
body_1_ffn_dwconv                                  cosine: 0.9900, scale: 0.0499, zero_point: 188
patch_embed_proj                                   cosine: 0.9907, scale: 0.0133, zero_point: 125
body_1_ffn_project_out                             cosine: 0.9928, scale: 0.0131, zero_point: 127
float_functional_simple_2                          cosine: 0.9939, scale: 0.0142, zero_point: 139
body_0_ffn_project_out                             cosine: 0.9941, scale: 0.0134, zero_point: 128
body_1_ffn_act.f_mul_alpha                         cosine: 0.9949, scale: 0.0001, zero_point: 0
body_2_ffn_project_out                             cosine: 0.9991, scale: 0.0070, zero_point: 122
float_functional_simple_8                          cosine: 0.9993, scale: 0.0016, zero_point: 133
float_functional_simple_4                          cosine: 0.9993, scale: 0.0064, zero_point: 135
body_2_ffn_act.f_add                               cosine: 0.9994, scale: 0.0067, zero_point: 49
body_3_ffn_project_out                             cosine: 0.9995, scale: 0.0051, zero_point: 148
body_3_ffn_act.f_add                               cosine: 0.9995, scale: 0.0048, zero_point: 0
body_2_ffn_project_in                              cosine: 0.9995, scale: 0.0125, zero_point: 130
float_functional_simple_6                          cosine: 0.9996, scale: 0.0061, zero_point: 122

QMLBNR_Video_Images_CleanedImgOut(
  (fake_quant_0): QuantStub(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0043], device='cuda:0'), zero_point=tensor([5], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.060858067125082016, max_val=1.065263271331787)
    )
  )
  (patch_embed_proj): Conv2d(
    34, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0056], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.7156392335891724, max_val=0.6727595925331116)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0133], device='cuda:0'), zero_point=tensor([125], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.7740585803985596, max_val=2.808042287826538)
    )
  )
  (body_0_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0118], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.50102961063385, max_val=1.4929399490356445)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0177], device='cuda:0'), zero_point=tensor([126], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-8.30683708190918, max_val=9.019135475158691)
    )
  )
  (body_0_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0132], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.6797765493392944, max_val=1.2519268989562988)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0127], device='cuda:0'), zero_point=tensor([124], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-8.81155776977539, max_val=8.465405464172363)
    )
  )
  (body_0_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0008], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.0, max_val=1.1990134716033936)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=1.9476141185914564e-11, max_val=8.480807304382324)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0005], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.1360728144645691, max_val=-0.1360728144645691)
      )
    )
  )
  (body_0_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.656369149684906, max_val=0.5680031180381775)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0134], device='cuda:0'), zero_point=tensor([128], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.129014492034912, max_val=2.9563019275665283)
    )
  )
  (body_1_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0104], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.1867985725402832, max_val=1.3303239345550537)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0285], device='cuda:0'), zero_point=tensor([123], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-10.604576110839844, max_val=11.774846076965332)
    )
  )
  (body_1_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0114], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.218837857246399, max_val=1.4485722780227661)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0499], device='cuda:0'), zero_point=tensor([188], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-26.407068252563477, max_val=9.284073829650879)
    )
  )
  (body_1_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([6.1041e-05], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.0, max_val=0.04620003327727318)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0133], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=8.125244184073455e-13, max_val=9.228955268859863)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([6.8477e-06], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.0017461760435253382, max_val=-0.0017461760435253382)
      )
    )
  )
  (body_1_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0058], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.6410632133483887, max_val=0.7345181703567505)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0131], device='cuda:0'), zero_point=tensor([127], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.9626619815826416, max_val=3.1261534690856934)
    )
  )
  (body_2_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0094], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.9506298303604126, max_val=1.1930553913116455)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0125], device='cuda:0'), zero_point=tensor([130], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.432260036468506, max_val=4.595865249633789)
    )
  )
  (body_2_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0110], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.1248000860214233, max_val=1.408692479133606)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0170], device='cuda:0'), zero_point=tensor([177], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-12.326212882995605, max_val=4.741044044494629)
    )
  )
  (body_2_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0003], device='cuda:0'), zero_point=tensor([238], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.32712748646736145, max_val=0.005777135491371155)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0067], device='cuda:0'), zero_point=tensor([49], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.32712748646736145, max_val=4.730205535888672)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.02653917297720909, max_val=0.02653917297720909)
      )
    )
  )
  (body_2_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0048], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.6111413240432739, max_val=0.6124895811080933)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0070], device='cuda:0'), zero_point=tensor([122], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.243239164352417, max_val=2.098046064376831)
    )
  )
  (body_3_ffn_project_in): Conv2d(
    32, 64, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0085], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.9145596027374268, max_val=1.082226276397705)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0085], device='cuda:0'), zero_point=tensor([126], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.47300910949707, max_val=3.173271656036377)
    )
  )
  (body_3_ffn_dwconv): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0113], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-1.4376935958862305, max_val=1.413638710975647)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0102], device='cuda:0'), zero_point=tensor([147], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-4.438266754150391, max_val=4.107792377471924)
    )
  )
  (body_3_ffn_act): QPReLU(
    (relu1): ReLU()
    (relu2): ReLU()
    (f_mul_neg_one1): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_neg_one2): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
      )
    )
    (f_mul_alpha): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=0.0, max_val=0.12268836051225662)
      )
    )
    (f_add): FloatFunctional(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0048], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=2.958098876959525e-09, max_val=4.043459415435791)
      )
    )
    (quant): QuantStub(
      (activation_post_process): PTQFakeQuantize(
        fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0001], device='cuda:0'), zero_point=tensor([255], device='cuda:0', dtype=torch.int32)
        (activation_post_process): HistogramObserver(min_val=-0.02744886465370655, max_val=-0.02744886465370655)
      )
    )
  )
  (body_3_ffn_project_out): Conv2d(
    64, 32, kernel_size=(1, 1), stride=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0052], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.553494393825531, max_val=0.6569843292236328)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0051], device='cuda:0'), zero_point=tensor([148], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.0227982997894287, max_val=2.459625244140625)
    )
  )
  (output): Conv2d(
    32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
    (weight_fake_quant): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([0.0029], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): MinMaxObserver(min_val=-0.23968760669231415, max_val=0.3721259534358978)
    )
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0004], device='cuda:0'), zero_point=tensor([154], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.38361242413520813, max_val=0.10394330322742462)
    )
  )
  (fake_dequant_0): DeQuantStub()
  (float_functional_simple_0): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0043], device='cuda:0'), zero_point=tensor([5], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.060858067125082016, max_val=1.065263271331787)
    )
  )
  (float_functional_simple_1): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_2): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0142], device='cuda:0'), zero_point=tensor([139], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-5.174468040466309, max_val=2.951368808746338)
    )
  )
  (float_functional_simple_3): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_4): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0064], device='cuda:0'), zero_point=tensor([135], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.5720856189727783, max_val=2.035804271697998)
    )
  )
  (float_functional_simple_5): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_6): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0061], device='cuda:0'), zero_point=tensor([122], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-2.3314638137817383, max_val=1.8853096961975098)
    )
  )
  (float_functional_simple_7): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
    )
  )
  (float_functional_simple_8): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0016], device='cuda:0'), zero_point=tensor([133], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.6245776414871216, max_val=0.5727238655090332)
    )
  )
  (float_functional_simple_9): FloatFunctional(
    (activation_post_process): PTQFakeQuantize(
      fake_quant_enabled=tensor([1], device='cuda:0', dtype=torch.uint8), observer_enabled=tensor([0], device='cuda:0', dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0040], device='cuda:0'), zero_point=tensor([2], device='cuda:0', dtype=torch.int32)
      (activation_post_process): HistogramObserver(min_val=-0.06914715468883514, max_val=1.1036951541900635)
    )
  )
)
@zk1998
Copy link
Collaborator

zk1998 commented Nov 27, 2024 via email

@peterjc123 peterjc123 added the question Further information is requested label Nov 28, 2024
@zk1998
Copy link
Collaborator

zk1998 commented Nov 28, 2024

Hi @taestaes, you can add a pseudo-quantization switch behind this example code to analyze where the quantization error mainly comes.
The usage looks like:

    # Disable observer and enable fake quantization to validate model with quantization error
    ptq_model.apply(torch.quantization.disable_observer)
    ptq_model.apply(torch.quantization.enable_fake_quant)

    if torch.cuda.device_count() > 1:
        ptq_model = ptq_model.module
    # Disable the activation or weight quantization to check whether the quantization error comes from activation or weight.
    from tinynn.graph.quantization.fake_quantize import PTQFakeQuantize
    for name, module in ptq_model.named_modules():
        if isinstance(module, PTQFakeQuantize):
            # disable all activation weight quantization
            # if name.endswith('weight_fake_quant'):
            #     module.apply(torch.quantization.disable_fake_quant)
            # disable all activation quantization
            if name.endswith('activation_post_process') :
                module.apply(torch.quantization.disable_fake_quant)

    # Or, you could profile to identify which operator is causing the quantization to fail.
    # First, disable all quantization, then gradually enable quantization for certain layers/Op, and pinpoint the layer with the highest quantization loss.
    # The following is just an example, you should comment or modify it before profiling.
    # ptq_model.apply(torch.quantization.disable_fake_quant)
    # for name, child in ptq_model.named_modules():
    #     # if name in ['body_0_ffn_act']:
    #     if 'ffn_act' in name:
    #         child.apply(torch.quantization.enable_fake_quant)

    dummy_input.to(device)
    ptq_model(dummy_input)
    print(ptq_model)

All quantized value ranges look normal. I think it is the loss of accuracy caused by Prelu's activation value quantization. You can firstly try to only quantize Prelu and check the quantization error.

@taestaes
Copy link
Author

    # Get the first batch from the context.train_loader
    first_batch = next(iter(context.val_loader))

    # Set the history frame.
    model._set_history_frame(first_batch, 0)

    # Update the info from the cfg.
    model._update_cfg(first_batch['cfg'])

    # Send the input data to the buffer.
    model.feed_data(first_batch)

    dummy_input_real = model.lq[:1]

    ptq_model.apply(torch.quantization.disable_observer)
    ptq_model.apply(torch.quantization.enable_fake_quant)
    output = ptq_model(dummy_input_real)
    torch.save(output[:,:,:500,:500].clone(), 'output_ptq.pt')

    ptq_model.apply(torch.quantization.disable_fake_quant)
    ptq_model.apply(torch.quantization.enable_observer)
    output_real = ptq_model(dummy_input_real)
    torch.save(output_real[:,:,:500,:500].clone(), 'output_real.pt')

The result files are output_ptq.zip
You can see that the values are changing as:

output
tensor([[[[ 0.0040,  0.0000, -0.0080,  ...,  0.0040,  0.0000,  0.0040],
          [-0.0080,  0.0040,  0.0040,  ..., -0.0040,  0.0040,  0.0040],
          [ 0.0040,  0.0040,  0.0000,  ...,  0.0040,  0.0000,  0.0000],
          ...,
          [-0.0080, -0.0040,  0.0040,  ...,  0.0040,  0.0080,  0.0000],
          [ 0.0080,  0.0080,  0.0040,  ...,  0.0119,  0.0000,  0.0040],
          [-0.0040,  0.0040, -0.0040,  ...,  0.0040,  0.0040, -0.0040]],
         [[-0.0080, -0.0080,  0.0000,  ...,  0.0000,  0.0000, -0.0040],
          [-0.0040, -0.0080,  0.0040,  ..., -0.0040,  0.0040,  0.0040],
          [ 0.0040, -0.0040,  0.0040,  ..., -0.0040,  0.0199,  0.0199],
          ...,
          [-0.0080,  0.0040, -0.0040,  ..., -0.0080, -0.0080,  0.0040],
          [ 0.0119, -0.0080,  0.0040,  ..., -0.0080,  0.0040,  0.0159],
          [-0.0040, -0.0080,  0.0000,  ...,  0.0040, -0.0080, -0.0080]],
         [[-0.0080, -0.0080, -0.0040,  ...,  0.0040,  0.0000,  0.0000],
          [ 0.0080,  0.0080, -0.0080,  ...,  0.0040,  0.0080, -0.0080],
          [-0.0040,  0.0000,  0.0000,  ...,  0.0000,  0.0119,  0.0040],
          ...,
          [ 0.0080, -0.0080,  0.0040,  ..., -0.0040,  0.0000, -0.0080],
          [-0.0040,  0.0000,  0.0000,  ..., -0.0080,  0.0040,  0.0040],
          [-0.0080,  0.0000,  0.0040,  ..., -0.0040, -0.0040,  0.0040]],
         [[-0.0040, -0.0040, -0.0040,  ...,  0.0040, -0.0040,  0.0040],
          [-0.0040, -0.0080, -0.0080,  ...,  0.0000, -0.0040, -0.0040],
          [-0.0080, -0.0080, -0.0040,  ...,  0.0040,  0.0000,  0.0040],
          ...,
          [ 0.0000,  0.0040,  0.0040,  ...,  0.0000, -0.0040,  0.0040],
          [ 0.0000,  0.0040, -0.0040,  ...,  0.0000,  0.0000,  0.0040],
          [ 0.0000, -0.0080,  0.0000,  ..., -0.0080,  0.0000, -0.0080]]]],
       device='cuda:0', grad_fn=<PixelShuffleBackward0>)
output_real
tensor([[[[ 5.3058e-04,  2.8980e-04, -6.9127e-06,  ...,  1.7132e-03,
            2.6455e-03,  1.7680e-03],
          [-4.6593e-05, -6.8595e-05,  1.1221e-05,  ...,  1.9627e-03,
            2.2377e-03,  1.9130e-03],
          [-3.1052e-04, -3.8708e-04, -7.7196e-05,  ...,  1.2334e-03,
            2.0632e-03,  1.3827e-03],
          ...,
          [-6.5564e-04, -4.1493e-04,  6.6315e-06,  ..., -1.3240e-04,
            1.8151e-04,  7.3130e-05],
          [-2.3227e-04, -6.4236e-04, -3.9087e-04,  ..., -4.1661e-05,
            8.5274e-04,  5.5308e-04],
          [-2.3724e-04, -8.0742e-04, -4.4858e-04,  ..., -9.9482e-05,
            3.8351e-04,  2.2529e-04]],
         [[ 4.0678e-04,  5.6869e-04,  2.8066e-04,  ...,  3.4742e-03,
            3.5885e-03,  3.3987e-03],
          [ 2.0502e-04,  7.0215e-04, -1.2013e-04,  ...,  3.2012e-03,
            3.5111e-03,  3.1280e-03],
          [ 4.3886e-05, -2.1096e-04, -2.7194e-05,  ...,  2.8814e-03,
            3.1086e-03,  3.0355e-03],
          ...,
          [ 1.5980e-04,  1.1029e-04,  1.8060e-04,  ..., -3.0157e-05,
           -6.5934e-05, -1.6346e-04],
          [-2.2554e-04,  1.7096e-04, -1.5797e-04,  ...,  6.5495e-05,
           -3.4225e-04, -1.5740e-04],
          [-2.4068e-04,  2.9705e-04,  5.6365e-05,  ..., -2.3401e-04,
           -1.4961e-04, -6.9319e-04]],
         [[ 5.5293e-04,  4.7936e-04,  4.6354e-04,  ...,  3.4175e-03,
            3.9984e-03,  3.8437e-03],
          [ 5.9630e-04,  3.6567e-04,  3.6413e-04,  ...,  3.5018e-03,
            3.4483e-03,  3.5330e-03],
          [ 1.8462e-05, -8.4457e-05,  1.7958e-04,  ...,  3.0389e-03,
            3.4476e-03,  2.9877e-03],
          ...,
          [ 5.8573e-06,  1.9718e-04,  1.9586e-04,  ...,  3.8674e-05,
           -1.9411e-04,  1.3282e-04],
          [-1.6424e-04, -3.5088e-04,  1.9186e-04,  ..., -1.2239e-05,
            1.2320e-04, -9.2608e-05],
          [-9.5273e-05,  4.7107e-05, -9.0065e-05,  ...,  1.3077e-04,
           -8.2172e-04,  1.7154e-04]],
         [[ 1.1913e-04,  5.1337e-05,  9.3932e-05,  ...,  1.2537e-03,
            1.6869e-03,  1.1538e-03],
          [ 2.3396e-05, -4.5896e-04, -1.7053e-05,  ...,  1.3006e-03,
            1.2846e-03,  1.0711e-03],
          [-1.0827e-04, -1.2818e-04,  4.2531e-05,  ...,  1.2620e-03,
            1.6253e-03,  1.2912e-03],
          ...,
          [ 2.0148e-04, -2.4638e-04,  4.6862e-05,  ..., -2.4427e-05,
           -1.3010e-04, -3.9609e-06],
          [-1.6371e-04, -2.9308e-04,  3.4205e-05,  ..., -1.4886e-05,
           -1.6856e-04, -3.4492e-04],
          [ 6.9447e-05, -2.2164e-04,  4.1239e-04,  ..., -3.4211e-05,
            2.2587e-04, -1.5262e-04]]]], device='cuda:0',
       grad_fn=<PixelShuffleBackward0>)
output.mean()
tensor(0.0041, device='cuda:0', grad_fn=<MeanBackward0>)
output_real.mean()
tensor(0.0036, device='cuda:0', grad_fn=<MeanBackward0>)
torch.mean((output - output_real) ** 2)
tensor(3.5770e-05, device='cuda:0', grad_fn=<MeanBackward0>)

image

Isn't it severe degradation? I don't know whether the PTQ worked well or not. How can I solve this issue?

@taestaes
Copy link
Author

When I only apply quantize for 'ffn_act' in name:

ptq_model.apply(torch.quantization.disable_fake_quant)
for name, child in ptq_model.named_modules():
    # if name in ['body_0_ffn_act']:
    if 'ffn_act' in name:
        # if 'ffn_act' not in name:
        # if True:
        child.apply(torch.quantization.enable_fake_quant)
output = ptq_model(dummy_input_real)
print(output)
print(torch.mean((output - output_real) ** 2))
tensor([[[[ 2.0368e-03,  1.2448e-03,  7.7955e-04,  ...,  9.3544e-04,
            4.0698e-03,  3.5494e-03],
          [ 2.9961e-03,  3.0125e-04, -2.0483e-04,  ...,  3.1312e-03,
            2.2275e-03,  3.7057e-03],
          [-3.7076e-04,  5.1835e-04, -3.0740e-03,  ...,  1.3497e-03,
           -1.0703e-04, -3.2131e-04],
          ...,
          [ 1.3461e-03, -2.3721e-04,  2.3214e-03,  ..., -1.2453e-03,
            1.6108e-03,  4.3571e-05],
          [ 1.7997e-03,  1.3568e-04,  1.8453e-04,  ...,  9.4718e-04,
            4.1021e-03,  2.7804e-03],
          [ 1.6654e-03, -3.8672e-04, -1.2939e-04,  ..., -8.9052e-04,
            3.0044e-03,  1.0265e-03]],
         [[ 9.2924e-04, -2.5432e-04,  4.8849e-04,  ...,  2.8125e-03,
            4.1685e-03,  4.0453e-03],
          [-5.0195e-04,  1.5800e-03, -6.3357e-04,  ...,  3.6965e-03,
            1.8738e-03,  4.5212e-03],
          [ 4.2118e-04,  4.2148e-04,  8.4925e-04,  ...,  1.5781e-03,
            2.5055e-03,  2.4197e-03],
          ...,
          [-1.0149e-03, -5.6088e-04, -1.0164e-03,  ...,  4.2484e-04,
            8.1021e-04, -1.1831e-03],
          [-6.1880e-04, -1.8684e-03, -1.1293e-04,  ...,  9.5189e-04,
           -1.8500e-03, -3.0048e-03],
          [-1.5383e-03, -4.5454e-04, -1.2841e-04,  ..., -7.9780e-05,
           -3.5130e-03, -5.7345e-04]],
         [[ 1.7464e-03, -5.9591e-04,  1.9699e-03,  ...,  2.0564e-03,
            3.8366e-03,  4.4443e-03],
          [ 1.3778e-03, -2.1180e-03,  4.7921e-05,  ...,  3.7347e-03,
            3.6731e-03,  2.1038e-03],
          [ 1.4942e-03,  7.0955e-04,  1.7510e-03,  ...,  4.1659e-03,
            5.1922e-03,  1.6140e-03],
          ...,
          [-1.2614e-03, -4.5123e-03, -3.2840e-03,  ...,  2.0923e-03,
            9.8494e-05,  1.2904e-03],
          [ 2.5750e-04, -1.1238e-03,  3.8382e-04,  ..., -1.5174e-03,
            1.3054e-03, -1.4098e-03],
          [ 1.1181e-03, -3.0475e-04,  1.1111e-03,  ..., -4.7602e-04,
           -2.2319e-03, -1.6639e-03]],
         [[-2.1641e-04,  7.5085e-04, -8.1187e-04,  ...,  7.1958e-04,
            1.5348e-03,  1.6091e-03],
          [ 1.1027e-04,  4.4316e-04,  6.8363e-04,  ...,  3.2131e-03,
            1.0030e-03,  5.6958e-05],
          [-1.2995e-03,  3.3173e-04,  1.1150e-03,  ...,  2.8133e-03,
            1.6314e-05, -8.4240e-05],
          ...,
          [ 9.0627e-04, -7.3723e-04, -1.5959e-03,  ...,  7.7767e-04,
            2.6287e-04, -1.0412e-03],
          [-8.7875e-04, -7.2003e-05, -8.8811e-04,  ..., -5.6203e-04,
           -5.9063e-04, -7.2781e-04],
          [ 9.3326e-04, -7.9842e-06,  9.7631e-04,  ..., -9.0815e-04,
            6.0385e-04,  9.3084e-04]]]], device='cuda:0',
       grad_fn=<PixelShuffleBackward0>)
tensor(5.0668e-05, device='cuda:0', grad_fn=<MeanBackward0>)

When I only apply quantize for 'ffn_act' not in name:

ptq_model.apply(torch.quantization.disable_fake_quant)
for name, child in ptq_model.named_modules():
    # if name in ['body_0_ffn_act']:
    # if 'ffn_act' in name:
    if 'ffn_act' not in name:
        # if True:
        child.apply(torch.quantization.enable_fake_quant)
output = ptq_model(dummy_input_real)
print(output)
print(torch.mean((output - output_real) ** 2))
tensor([[[[-0.0085,  0.0042,  0.0042,  ...,  0.0000,  0.0000, -0.0042],
          [-0.0085, -0.0085,  0.0000,  ...,  0.0000,  0.0000, -0.0042],
          [-0.0042,  0.0000,  0.0042,  ...,  0.0000,  0.0127,  0.0127],
          ...,
          [-0.0085, -0.0169,  0.0000,  ...,  0.0042, -0.0042, -0.0085],
          [ 0.0000,  0.0000, -0.0042,  ...,  0.0127,  0.0000,  0.0127],
          [ 0.0042,  0.0000, -0.0042,  ..., -0.0085, -0.0042, -0.0085]],
         [[ 0.0000, -0.0085,  0.0127,  ...,  0.0085,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0042,  ...,  0.0085, -0.0042,  0.0000],
          [ 0.0042,  0.0085,  0.0085,  ...,  0.0042,  0.0042,  0.0085],
          ...,
          [-0.0042,  0.0042, -0.0085,  ..., -0.0085, -0.0085, -0.0085],
          [ 0.0042,  0.0000,  0.0169,  ...,  0.0042,  0.0169,  0.0000],
          [ 0.0042,  0.0042, -0.0042,  ...,  0.0127, -0.0127, -0.0042]],
         [[-0.0042,  0.0085,  0.0000,  ..., -0.0042, -0.0085,  0.0042],
          [-0.0042, -0.0042,  0.0085,  ..., -0.0042,  0.0085,  0.0042],
          [ 0.0000,  0.0042,  0.0000,  ..., -0.0085,  0.0085,  0.0085],
          ...,
          [ 0.0085,  0.0085,  0.0042,  ..., -0.0042, -0.0085,  0.0000],
          [-0.0042,  0.0042,  0.0000,  ...,  0.0085,  0.0042,  0.0000],
          [ 0.0000, -0.0042,  0.0000,  ...,  0.0000,  0.0042,  0.0042]],
         [[ 0.0042, -0.0042, -0.0042,  ..., -0.0042,  0.0042,  0.0042],
          [ 0.0042, -0.0042, -0.0042,  ...,  0.0042,  0.0000,  0.0042],
          [ 0.0042,  0.0000,  0.0085,  ...,  0.0127, -0.0042, -0.0085],
          ...,
          [ 0.0042, -0.0085, -0.0042,  ...,  0.0000,  0.0000,  0.0042],
          [ 0.0042,  0.0000,  0.0042,  ...,  0.0000,  0.0042,  0.0042],
          [ 0.0000, -0.0042,  0.0000,  ...,  0.0085,  0.0042,  0.0000]]]],
       device='cuda:0', grad_fn=<PixelShuffleBackward0>)
tensor(4.7243e-05, device='cuda:0', grad_fn=<MeanBackward0>)

It seems two errors are similar but the result values are quantized with 0.0042 for 'ffn_act' not in name.
How can I make the result better...?

@zk1998
Copy link
Collaborator

zk1998 commented Nov 28, 2024

The result looks bad, whether it is only quantizing prelu or not quantizing prelu, you can determine the quantization error by checking the result of graph_error_analysis to see whether the consine of the corresponding output is close to 1.

BTW, if allowed, you could provide the xx_q.py, xx_q.pth generated in the out directory and provide several example input tensors, I can help you check the quantization Source of error.

@taestaes
Copy link
Author

taestaes commented Nov 28, 2024

@zk1998 What is your email? I can send it. The dummy input files have large filesize.

@zk1998
Copy link
Collaborator

zk1998 commented Nov 28, 2024

@taestaes
Copy link
Author

[email protected]

I have sent the input files and the output folder. Can you find them? Thank you for your help, and I really need your assistance. I really want to resolve this issue.

@zk1998
Copy link
Collaborator

zk1998 commented Nov 28, 2024

I have received it and will help you analyze it ASAP.

@zk1998
Copy link
Collaborator

zk1998 commented Nov 29, 2024

Hi @taestaes ,could you please provide a simple post-processing function that outputs pixel values? I found that your model output contains a large number of values ​​close to 0, which makes it difficult for me to compare numerical accuracy. The comparison of pixel values ​​will be more meaningful.

@taestaes
Copy link
Author

Hi @taestaes ,could you please provide a simple post-processing function that outputs pixel values? I found that your model output contains a large number of values ​​close to 0, which makes it difficult for me to compare numerical accuracy. The comparison of pixel values ​​will be more meaningful.

What do you mean by post-processing function? my model's input are image values and outputs are also image values.

@taestaes
Copy link
Author

Also I want to say that without calibration, the output values are all 0 or 1.
After calibration with the real validation images (sent to you), the output is shown as above. So I think if you find all values are 0 or 1, it will have values after calibration.

@zk1998
Copy link
Collaborator

zk1998 commented Nov 29, 2024

The input and output value range of your model is float[0,1], but the pixel value range of the image is int[0,255].It seems that your model should be image input and image output, so your dataloader should have preprocess and postprocess to process the image.

@zk1998
Copy link
Collaborator

zk1998 commented Nov 29, 2024

@taestaes Maybe your work relies on an open source project, this will make it easier for me to reproduce your problem. I currently find that it is difficult to evaluate the quality of your model by only observing numerical accuracy. It needs to be combined with actual tasks.

@taestaes
Copy link
Author

The input and output value range of your model is float[0,1], but the pixel value range of the image is int[0,255].It seems that your model should be image input and image output, so your dataloader should have preprocess and postprocess to process the image.

It's right. My original input raw is 12 bit image (max value 4096), so I normalize it to (0,1). for example, a real input has range( -0.0483, 0.9995). The output of model has range as (-0.0289, 1.0635). I recover the output range as 8bit image (*255)

@taestaes
Copy link
Author

@taestaes Maybe your work relies on an open source project, this will make it easier for me to reproduce your problem. I currently find that it is difficult to evaluate the quality of your model by only observing numerical accuracy. It needs to be combined with actual tasks.

Yes, it's right. Our code is based on https://github.com/XPixelGroup/BasicSR, but we changed it to custom network and denoising task.

@zk1998
Copy link
Collaborator

zk1998 commented Nov 29, 2024

@taestaes
Copy link
Author

@taestaes which base model do you use in https://github.com/XPixelGroup/BasicSR/blob/master/docs/ModelZoo.md

I'm using custom model so I think the our model is not based on the above.

@taestaes
Copy link
Author

@taestaes which base model do you use in https://github.com/XPixelGroup/BasicSR/blob/master/docs/ModelZoo.md

https://github.com/swz30/Restormer This is more accurate base branch. our model changed from this

@zk1998
Copy link
Collaborator

zk1998 commented Nov 29, 2024

It seems I am unable to reproduce your work from the open-source repository. Could you provide the processing function for model outputs to images? This would allow me to visually assess the quality of the model's performance.

Alternatively, could you provide some direct metrics that can be used to evaluate the quality of the outputs? Directly comparing the mean squared error (MSE) between floating-point outputs and quantized outputs is not meaningful, as your output values are numerous and mostly close to 0. The errors in the significant values cannot be reflected by MSE.

@taestaes
Copy link
Author

It seems I am unable to reproduce your work from the open-source repository. Could you provide the processing function for model outputs to images? This would allow me to visually assess the quality of the model's performance.

Alternatively, could you provide some direct metrics that can be used to evaluate the quality of the outputs? Directly comparing the mean squared error (MSE) between floating-point outputs and quantized outputs is not meaningful, as your output values are numerous and mostly close to 0. The errors in the significant values cannot be reflected by MSE.

    def _tensor2rgb(self, img: torch.tensor) -> np.array:
        out_type = np.uint8

        bs, chan, h, w = img.shape
        H, W = h * 2, w * 2
        img2 = torch.zeros((bs, H, W)) if torch.is_tensor(img) else np.zeros((bs, H, W)).astype(img.dtype)

        img = img[:, [1, 3, 0, 2], :, :]

        img2[:, 0:H:2, 0:W:2] = img[:, 0, :, :]
        img2[:, 0:H:2, 1:W:2] = img[:, 1, :, :]
        img2[:, 1:H:2, 0:W:2] = img[:, 2, :, :]
        img2[:, 1:H:2, 1:W:2] = img[:, 3, :, :]

        img2 = img2.unsqueeze(1) if torch.is_tensor(img2) else np.expand_dims(img2, axis=1)
        img = img2.squeeze()
        min_max = (0, 1)

        img = self._apply_val_data_gain(img.clone(), 100)
        _tensor = img

        _tensor = _tensor.squeeze(0).float().detach().cpu().clamp_(*min_max)
        _tensor = (_tensor - min_max[0]) / (min_max[1] - min_max[0])

        img_np = _tensor.numpy()
        if out_type == np.uint8:
            img_np = (img_np * 255.0).round()
        img_np = img_np.astype(out_type)

        img_np = cv2.demosaicing(img_np, 49)

        return img_np
    def _apply_val_data_gain(self, img: torch.tensor, gain: float) -> torch.tensor:
        return (img * gain).clamp(0, 1)

Can you test this function? this makes output to uint8 img, you can use cv2.imwrite to see image. Also I used _apply_val_data_gain to make image brighter x100, since the output image are very dark.

@zk1998
Copy link
Collaborator

zk1998 commented Nov 29, 2024

I did some ablation experiments using your evaluation method:

  1. Only per-channel quantization is performed on the weights. The output image is not much different from the original image, and there is some color difference. Use the following code to reproduce:
    Specify when initializing QATQuantizer:
quantizer = QATQuantizer(
    model, dummy_input, work_dir='out', config={
        'override_qconfig_func': set_ptq_fake_quantize,
        # 'force_overwrite': False, 
        # 'rewrite_graph': False,
        'per_tensor': False,
    }
)

and only enable weight fake quant by add code after:

ptq_model.apply(torch.quantization.disable_fake_quant)
for name, module in ptq_model.named_modules():
    if isinstance(module, torch.quantization.FakeQuantize):
        # disable all activation weight quantization
        if name.endswith('weight_fake_quant'):
            module.apply(torch.quantization.enable_fake_quant)
            print(f"enable fake_quant {name}")
  1. I found that activatio quantization has a great impact on denoising tasks. Just quantizing the activations of the first conv will create a lot of noise. Reproduce by adding code after:
ptq_model.apply(torch.quantization.disable_fake_quant)
for name, module in ptq_model.named_modules():
    if isinstance(module, torch.quantization.FakeQuantize):
        # disable all activation weight quantization
        if name.endswith('weight_fake_quant'):
            module.apply(torch.quantization.enable_fake_quant)
            print(f"enable fake_quant {name}")
        
        if name.endswith('activation_post_process') and 'patch_embed_proj' in name:
            module.apply(torch.quantization.enable_fake_quant)

All in all, it is difficult to quantize the denoising model, only applying ptq will generate more noise. From a numerical perspective, quantization will introduce a lot of numerical errors, especially after the picture is brightened, the numerical errors are further amplified, which is why there is more noise in the picture above.

@taestaes
Copy link
Author

Thanks. I found that following your ablation 1, the output image seems good, while there are some color tone difference, the noise is not that severe. The 'per_tensor': False, operation makes better PTQ?

@taestaes
Copy link
Author

taestaes commented Nov 29, 2024

I compared ablation images, and the first conv activation quantizations induces a lot of noise.

    ptq_model.apply(torch.quantization.disable_fake_quant)
    ptq_model.apply(torch.quantization.enable_observer)
    output_real = ptq_model(dummy_input_real)
    torch.save(output_real[:,:,:500,:500].clone(), 'output_real.pt')
    output_real_img = model._tensor2img(output_real, rgb2bgr=True, min_max=(0, 1))
    imwrite(output_real_img, "output_real.png")

    ptq_model.apply(torch.quantization.disable_observer)
    ptq_model.apply(torch.quantization.enable_fake_quant)
    output_ptq = ptq_model(dummy_input_real)
    torch.save(output_ptq[:,:,:500,:500].clone(), 'output_ptq.pt')
    output_ptq_img = model._tensor2img(output_ptq, rgb2bgr=True, min_max=(0, 1))
    imwrite(output_ptq_img, "output_ptq.png")

    ptq_model.apply(torch.quantization.disable_fake_quant)
    for name, module in ptq_model.named_modules():
        if isinstance(module, torch.quantization.FakeQuantize):
            # disable all activation weight quantization
            if name.endswith('weight_fake_quant'):
                module.apply(torch.quantization.enable_fake_quant)
                print(f"enable fake_quant {name}")
    output = ptq_model(dummy_input_real)
    output_img = model._tensor2img(output, rgb2bgr=True, min_max=(0, 1))
    imwrite(output_img, "output_weight_quantize.png")

    ptq_model.apply(torch.quantization.disable_fake_quant)
    for name, module in ptq_model.named_modules():
        if isinstance(module, torch.quantization.FakeQuantize):
            # disable all activation weight quantization
            if name.endswith('weight_fake_quant'):
                module.apply(torch.quantization.enable_fake_quant)
                print(f"enable fake_quant {name}")

            if name.endswith('activation_post_process') and 'patch_embed_proj' in name:
                module.apply(torch.quantization.enable_fake_quant)
    output = ptq_model(dummy_input_real)
    output_img = model._tensor2img(output, rgb2bgr=True, min_max=(0, 1))
    imwrite(output_img, "output_act_first_conv.png")

image

@taestaes
Copy link
Author

How can I solve this issue? If I try QAT, can it make good results?

Or is there option for higher accuracy for activation, such as 16 bit quantization for activation?

@zk1998
Copy link
Collaborator

zk1998 commented Dec 2, 2024

  1. When converting to pixel value, the activation quantization error usually only about 1 pixel value error (the error is small). However, after _apply_val_data_gain direct *100 operation, the pixel value will be severely amplified, showing serious distortion. Noise, I think you need to consider lighting it up to see if the model effect can align with your final needs.
  2. 'per_tensor': False means per-channel weight quantization, the accuracy will be higher.
  3. I think QAT can improve the quantification accuracy of your model, you can try it.
  4. Higher bits, such as INT16 activation quantization, depends on whether your deployment platform supports it. TinyNN supports INT16 activation value quantization, example.

@taestaes
Copy link
Author

taestaes commented Dec 2, 2024

  1. When converting to pixel value, the activation quantization error usually only about 1 pixel value error (the error is small). However, after _apply_val_data_gain direct *100 operation, the pixel value will be severely amplified, showing serious distortion. Noise, I think you need to consider lighting it up to see if the model effect can align with your final needs.
  2. 'per_tensor': False means per-channel weight quantization, the accuracy will be higher.
  3. I think QAT can improve the quantification accuracy of your model, you can try it.
  4. Higher bits, such as INT16 activation quantization, depends on whether your deployment platform supports it. TinyNN supports INT16 activation value quantization, example.

Thanks. The problem is that the picture is in very lowlight scenario, and we need to denoise it on that scenario.

I have tried QAT but the validation accuracy did not increase, so I'm not sure yet.

I checked your INT16 example, but it seems TFLite format. Can I check the result of INT16 quantization as the above python code? I don't know how to see results of TFLite yet.

@zk1998
Copy link
Collaborator

zk1998 commented Dec 2, 2024

I tried INT16, and it worked very well. You only need to add one line after:

ptq_model = quantizer.quantize()
quantizer.rescale_activations_with_quant_min_max(0, 65535)

And to make fake-quant pass on activation-int16 quantization, modify line to:

if (self.scale == 1 and self.zero_point == 0) or self.scale == 257:

@taestaes
Copy link
Author

taestaes commented Dec 2, 2024

I tried INT16, and it worked very well. You only need to add one line after:

ptq_model = quantizer.quantize()
quantizer.rescale_activations_with_quant_min_max(0, 65535)

And to make fake-quant pass on activation-int16 quantization, modify line to:

if (self.scale == 1 and self.zero_point == 0) or self.scale == 257:

Thanks for your answer, I found that int16 quantization makes good results as below (8b/16b/float). But do you know why there are some color tone difference after quantization? the input image is originally greenish, so I'm not sure why the quantization changes the color tone.
image

@zk1998
Copy link
Collaborator

zk1998 commented Dec 2, 2024

I have no idea, maybe you can try to compare the pixel values ​​of RGB channels of floating point and quantized output individually to find the reason, then do some normalization to fix the color tone change.

@taestaes
Copy link
Author

taestaes commented Dec 3, 2024

I have no idea, maybe you can try to compare the pixel values ​​of RGB channels of floating point and quantized output individually to find the reason, then do some normalization to fix the color tone change.

I have checked R G G B values of the input/output tensor, and found that withtout quantization, the channelwise mean stay similar, but with quantization, the R, B mean increases while G mean decreases. Isn't the quantization is performed per-channel? I can't understand why some channel mean increases and other channel mean decreases.

Input Mean tensor([[0.0027, 0.0044, 0.0044, 0.0026, 0.0026, 0.0045, 0.0047, 0.0027, 0.1000,
         0.6500]], device='cuda:0'), STD tensor([[0.0346, 0.0349, 0.0349, 0.0274, 0.0343, 0.0346, 0.0345, 0.0269, 0.0000,
         0.0000]], device='cuda:0')
Output_real Mean tensor([[0.0026, 0.0045, 0.0047, 0.0027]], device='cuda:0',
       grad_fn=<MeanBackward1>), STD tensor([[0.0343, 0.0346, 0.0345, 0.0269]], device='cuda:0',
       grad_fn=<StdBackward0>)
Output_PTQ Mean tensor([[0.0033, 0.0041, 0.0040, 0.0030]], device='cuda:0',
       grad_fn=<MeanBackward1>), STD tensor([[0.0342, 0.0345, 0.0345, 0.0269]], device='cuda:0',
       grad_fn=<StdBackward0>)

@zk1998
Copy link
Collaborator

zk1998 commented Dec 3, 2024

Per-channel quantization refers to the granularity of weight quantization in conv and has nothing to do with the RGGB channel. Quantization will introduce errors, especially the generated task like yours, which is more sensitive to errors. I have no idea how to solve this color shift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants