[TF FE]: Fix accuracy issue for Prod on complex tensors on ARM #26601

rkazants · 2024-09-14T11:25:56Z

Context

FAILED install/tests/layer_tests/tensorflow_tests/test_tf_ReduceArithmeticOps.py::TestComplexProd::test_reduce[ ie_device:CPU - precision:FP16 - keep_dims:True - params:{'shape': [2, 3, 5], 'axis': 1} ] - AssertionError: Comparing with Framework failed: ie_res={'Imag:0': array([[[ 219.75 , 480. , 372.25 , -128. , -414.25 ]],

   [[-854.     ,   41.84375,  583.5    ,  145.625  ,  471.     ]]],
  dtype=float32), 'Real:0': array([[[ 2.0012500e+02,  2.3217773e-01,  2.4625000e+02, -6.6000000e+01,
     -2.8175000e+02]],

   [[-2.7725000e+02,  8.8050000e+02, -5.8500000e+02, -2.8425000e+02,
     -1.7150000e+02]]], dtype=float32)}; framework_res={'Real:0': array([[[ 200.,    0.,  246.,  -66., -282.]],

   [[-275.,  881., -584., -284., -171.]]], dtype=float32), 'Imag:0': array([[[ 220.,  480.,  372., -128., -414.]],

   [[-855.,   42.,  584.,  146.,  471.]]], dtype=float32)}.

FAILED install/tests/layer_tests/tensorflow_tests/test_tf_ReduceArithmeticOps.py::TestComplexProd::test_reduce[ ie_device:CPU - precision:FP16 - keep_dims:False - params:{'shape': [2, 3, 5], 'axis': 1} ] - AssertionError: Comparing with Framework failed: ie_res={'Imag:0': array([[-550. , 522.5 , -83.75 , -44.65625, 70.0625 ],
[-236.125 , -34.6875 , 27.6875 , -134.25 , -246.875 ]],
dtype=float32), 'Real:0': array([[ -24.453125, -353.75 , -340.5 , -129.125 , -140. ],
[-302.25 , 149.125 , 1089. , -555. , 13.15625 ]],
dtype=float32)}; framework_res={'Real:0': array([[ -25., -354., -340., -129., -140.],
[-302., 149., 1090., -555., 13.]], dtype=float32), 'Imag:0': array([[-550., 522., -85., -45., 70.],
[-236., -35., 30., -135., -247.]], dtype=float32)}.
FAILED install/tests/layer_tests/tensorflow_tests/test_tf_ReduceArithmeticOps.py::TestComplexProd::test_reduce[ ie_device:CPU - precision:FP16 - keep_dims:False - params:{'shape': [3, 1, 2, 4], 'axis': -2} ] - AssertionError: Comparing with Framework failed: ie_res={'Imag:0': array([[[ 42. , 7. , -74. , -21. ]],

   [[  7.9140625,  45.03125  ,  46.       ,  76.       ]],

   [[ 28.03125  ,  41.90625  ,   8.9765625,  77.9375   ]]],
  dtype=float32), 'Real:0': array([[[-14.        ,   0.9804[687](https://github.com/openvinotoolkit/openvino/actions/runs/10807302723/job/29979218095?pr=26512#step:15:688)5,  -0.10742188, 105.0625    ]],

   [[ 62.96875   , -34.9375    ,  29.96875   ,  -2.1894531 ]],

   [[ 81.        , -66.0625    ,  32.03125   ,  54.0625    ]]],
  dtype=float32)}; framework_res={'Real:0': array([[[-14.,   1.,   0., 105.]],

   [[ 63., -35.,  30.,  -2.]],

   [[ 81., -66.,  32.,  54.]]], dtype=float32), 'Imag:0': array([[[ 42.,   7., -74., -21.]],

   [[  8.,  45.,  46.,  76.]],

   [[ 28.,  42.,   9.,  78.]]], dtype=float32)}.

= 3 failed, 3507 passed, 83 skipped, 68 xfailed, 481 xpassed, 50 warnings in 46.50s =

What needs to be done?

Needs to fix it

Example Pull Requests

No response

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
How to link your Pull Request to an issue

Contact points

@openvinotoolkit/openvino-tf-frontend-maintainers
@rkazants

Ticket

No response

The text was updated successfully, but these errors were encountered:

rkazants · 2024-09-14T11:26:22Z

@hub-bla, please take a look.

hub-bla · 2024-09-14T13:17:40Z

Hi @rkazants, the current formula that is used by the translator requires lots of steps to create the output, which is the root cause of inaccuracies.

I believe there is no other way of doing that with the current OpenVINO ops.

I think the best idea is to create new version of ReduceProd operation that would apply product of 2 complex numbers based on the following formula: (a + ib) (c + id) = (ac – bd) + i(ad + bc). (That's also how other frameworks like TensorFlow handle this kind of operation.)

Since there are already reduce operations in OpenVINO ops, extending them with this one would not be hard and really beneficial because it might be also used somewhere in the future.
Sorry for inconvenience!

cmagapu · 2024-09-15T05:23:33Z

@rkazants I'm new to Open Source and would like to contribute. Can I work on this?

rkazants · 2024-09-15T08:18:11Z

@cmagapu, hi! I do not recommend this for newcomers. Instead, please take a look at new JAX good first issues if you wish. They are much easier.

Best regards,
Roman

rkazants · 2024-09-15T08:23:41Z

Hi @rkazants, the current formula that is used by the translator requires lots of steps to create the output, which is the root cause of inaccuracies.

I believe there is no other way of doing that with the current OpenVINO ops.

I think the best idea is to create new version of ReduceProd operation that would apply product of 2 complex numbers based on the following formula: (a + ib) (c + id) = (ac – bd) + i(ad + bc). (That's also how other frameworks like TensorFlow handle this kind of operation.)

Since there are already reduce operations in OpenVINO ops, extending them with this one would not be hard and really beneficial because it might be also used somewhere in the future. Sorry for inconvenience!

Don't we have a problem with overflow of angle for example in the current implementation?
Btw, do you reproduce this issue on your CPU? Can it be arm specific problem in our CPU plugin?
We have many other operations decomposed into several steps.

hub-bla · 2024-09-15T08:35:58Z

Don't we have a problem with overflow of angle for example in the current implementation?

I did try applying modulo operation on the angle but it did not change the result.

Btw, do you reproduce this issue on your CPU? Can it be arm specific problem in our CPU plugin?

I do not have ARM processor but I did get to the point where the translator throws AssertionError on my cpu here. Also, we had an issue also with the accuracy on GPU as far as I remember.

We have many other operations decomposed into several steps.

I think the prod operation is quite special case because even slight errors will compound during multiplication.

I did not found translator that would have similar problem other than Rsqrt where I already changed the formula once.

rkazants · 2024-09-17T20:01:44Z

Hi @hub-bla,

I propose you to implement decomposition using Loop operation that will iteratively multiply complex tensors along reduction axes and replace existing solution.
If we find performance issue on real model with such case, we will plan separate operation in our opset.

How do you think?

Best regards,
Roman

hub-bla · 2024-09-18T07:56:27Z

Oops, I guess I omitted Loop operation when I was looking for other solution. Thank you for pointing that out. ;)

I'm currently trying to implement this. However, I could use a hint.

    default_op_checks(node, 2, {"Prod"}, true);
    auto input = node.get_input(0);
    auto axis = node.get_input(1);
    auto keep_dims = node.get_attribute<bool>("keep_dims", false);

    auto complex_type_mark = as_type_ptr<ComplexTypeMark>(input.get_node_shared_ptr());

    if (complex_type_mark) {
        element::Type complex_part_type = complex_type_mark->get_complex_part_type();
        input = complex_type_mark->input_value(0);

        auto gather_index_real = make_shared<v0::Constant>(element::i64, Shape{}, 0);
        auto gather_index_imag = make_shared<v0::Constant>(element::i64, Shape{}, 1);
        auto minus_one = make_shared<v0::Constant>(element::i32, Shape{1}, -1);

        Output<Node> real_part = make_shared<v8::Gather>(input, gather_index_real, minus_one);
        Output<Node> imag_part = make_shared<v8::Gather>(input, gather_index_imag, minus_one);

        Output<Node> example_real_part = std::make_shared<v1::ReduceProd>(real_part, axis, keep_dims);

        Output<Node> real_part_shape = std::make_shared<v0::ShapeOf>(example_real_part);

        auto const_one = create_same_type_const_scalar<float>(real_part, 1);
        auto init_reduced_real = std::make_shared<v1::Broadcast>(const_one, real_part_shape);

        auto const_zero = create_same_type_const_scalar<float>(imag_part, 0);
        auto init_reduced_imag = std::make_shared<v1::Broadcast>(const_zero, real_part_shape);


        auto reduced_real_input = std::make_shared<v0::Parameter>(real_part.get_element_type(), example_real_part.get_partial_shape());
        auto reduced_imag_input = std::make_shared<v0::Parameter>(imag_part.get_element_type(), example_real_part.get_partial_shape());
        auto gather_idx = std::make_shared<v0::Parameter>(element::i64, Shape{});


        auto gather_init = std::make_shared<v0::Constant>(element::i64, Shape{}, 0);
        auto gather_increment =  std::make_shared<v0::Constant>(element::i64, Shape{}, 1);

        auto real_to_compute = std::make_shared<v8::Gather>(real_part, gather_idx, axis);
        auto imag_to_compute = std::make_shared<v8::Gather>(imag_part, gather_idx, axis);

        // ac - bd + (ad - bc)

        auto ac = make_shared<v1::Multiply>(reduced_real_input, real_to_compute);

        auto bd = make_shared<v1::Multiply>(reduced_imag_input, imag_to_compute);

        auto ad = make_shared<v1::Multiply>(reduced_real_input, imag_to_compute);

        auto bc = make_shared<v1::Multiply>(reduced_imag_input, real_to_compute);

        auto new_real = make_shared<v1::Subtract>(ac, bd);
        auto new_imag = make_shared<v1::Subtract>(ad, bc);
        auto new_gather_idx = make_shared<v1::Add>(gather_idx, gather_increment);

        auto trip_count = std::make_shared<v0::Constant>(element::i32, Shape{}, 11);
        auto exec_condition = std::make_shared<v0::Constant>(element::boolean, Shape{}, true);

        auto loop = std::make_shared<v5::Loop>(trip_count, exec_condition);
        auto body =
                std::make_shared<Model>(OutputVector{new_real, new_imag, new_gather_idx}, ParameterVector{reduced_real_input, reduced_imag_input, gather_idx});

        loop->set_function(body);
        loop->set_merged_input(gather_idx, gather_init, new_gather_idx);
        loop->set_merged_input(reduced_real_input, init_reduced_real, new_real);
        loop->set_merged_input(reduced_imag_input, init_reduced_imag, new_imag);

//        auto real_output = loop->get_iter_value(new_real, -1);
//        auto imag_output = loop->get_iter_value(new_imag, -1);
//        example_real_part
        auto real_output = example_real_part;
        auto imag_output = example_real_part;
        auto real_unsqueeze = make_shared<v0::Unsqueeze>(real_output, minus_one);
        auto imag_unsqueeze = make_shared<v0::Unsqueeze>(imag_output, minus_one);

        auto concat_result = make_shared<v0::Concat>(OutputVector{real_unsqueeze, imag_unsqueeze}, -1);

        set_node_name(node.get_name(), concat_result);

        auto complex_result = make_shared<ComplexTypeMark>(concat_result, complex_part_type);
        return {complex_result};
    }

I'm currently getting this error at body part of the code even though it is not used as an output for now:

        auto body =
                std::make_shared<Model>(OutputVector{new_real, new_imag, new_gather_idx}, ParameterVector{reduced_real_input, reduced_imag_input, gather_idx});

I don't quite understand why it throws the error about complex type mark when it was has been taken down at the top of if statement. Do I need to mark parameters of body as complex type?

Thank you in advance!

rkazants added good first issue Good for newcomers category: TF FE OpenVINO TensorFlow FrontEnd no_stale Do not mark as stale labels Sep 14, 2024

rkazants mentioned this issue Sep 14, 2024

[GHA] Enable parallel execution of MO Python API python tests #26512

Merged

rkazants changed the title ~~[Good First Issue][TF FE]: Fix accuracy issue for Prod on complex tensors on ARM~~ [TF FE]: Fix accuracy issue for Prod on complex tensors on ARM Sep 15, 2024

rkazants removed the good first issue Good for newcomers label Sep 15, 2024

andrei-kochin assigned hub-bla Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TF FE]: Fix accuracy issue for Prod on complex tensors on ARM #26601

[TF FE]: Fix accuracy issue for Prod on complex tensors on ARM #26601

rkazants commented Sep 14, 2024

rkazants commented Sep 14, 2024

hub-bla commented Sep 14, 2024 •

edited

Loading

cmagapu commented Sep 15, 2024

rkazants commented Sep 15, 2024

rkazants commented Sep 15, 2024 •

edited

Loading

hub-bla commented Sep 15, 2024

rkazants commented Sep 17, 2024

hub-bla commented Sep 18, 2024 •

edited

Loading

[TF FE]: Fix accuracy issue for Prod on complex tensors on ARM #26601

[TF FE]: Fix accuracy issue for Prod on complex tensors on ARM #26601

Comments

rkazants commented Sep 14, 2024

Context

What needs to be done?

Example Pull Requests

Resources

Contact points

Ticket

rkazants commented Sep 14, 2024

hub-bla commented Sep 14, 2024 • edited Loading

cmagapu commented Sep 15, 2024

rkazants commented Sep 15, 2024

rkazants commented Sep 15, 2024 • edited Loading

hub-bla commented Sep 15, 2024

rkazants commented Sep 17, 2024

hub-bla commented Sep 18, 2024 • edited Loading

hub-bla commented Sep 14, 2024 •

edited

Loading

rkazants commented Sep 15, 2024 •

edited

Loading

hub-bla commented Sep 18, 2024 •

edited

Loading