paddle.static.auc 中ins_tag_weight 有bug #68640

welsonzhang · 2024-10-11T10:39:27Z

bug描述 Describe the Bug

问题1：ins_tag_weight 并没有传给auc kernel中，因此ins_tag_weight 并未生效。
https://github.com/PaddlePaddle/Paddle/blob/release/3.0-beta/python/paddle/static/nn/metric.py

    # "InsTagWeight": [ins_tag_weight]
    # Batch AUC
    helper.append_op(
        type="auc",
        inputs={
            "Predict": [input],
            "Label": [label],
            "StatPos": [batch_stat_pos],
            "StatNeg": [batch_stat_neg],
            "InsTagWeight": [ins_tag_weight],
        },
        attrs={
            "curve": curve,
            "num_thresholds": num_thresholds,
            "slide_steps": slide_steps,
        },
        outputs={
            "AUC": [batch_auc_out],
            "StatPosOut": [batch_stat_pos],
            "StatNegOut": [batch_stat_neg],
        },
    )

这里的# "InsTagWeight": [ins_tag_weight]为啥没有传进去op里面啊？不传进去怎么判断fake_data?

问题2：auc_kernel里面判断fake_data太粗了，为什么只判断batch里面的第一个数据？
https://github.com/PaddlePaddle/Paddle/blob/release/3.0-beta/paddle/phi/kernels/cpu/auc_kernel.cc
是不是理论上都要判断每个ins_tag_weight_data，从而判断那些mask的数据要丢，那些是不丢的。

  bool is_fake_data = false;
  if (ins_tag_weight.get_ptr() != nullptr) {
    const auto *ins_tag_weight_data = ins_tag_weight->data<float>();
    VLOG(4) << "auc ins_tag_weight = " << ins_tag_weight_data[0];
    if (ins_tag_weight_data[0] == 0) {
      is_fake_data = true;
    }
  }

其他补充信息 Additional Supplementary Information

No response

The text was updated successfully, but these errors were encountered:

welsonzhang · 2024-10-12T02:00:57Z

问题1修复方案：将ins_tag_weight传入到auc_kernel当中
https://github.com/PaddlePaddle/Paddle/blob/release/3.0-beta/python/paddle/static/nn/metric.py

    helper.append_op(
        type="auc",
        inputs={
            "Predict": [input],
            "Label": [label],
            "StatPos": [batch_stat_pos],
            "StatNeg": [batch_stat_neg],
            "InsTagWeight": [ins_tag_weight],
        },
        attrs={
            "curve": curve,
            "num_thresholds": num_thresholds,
            "slide_steps": slide_steps,
        },
        outputs={
            "AUC": [batch_auc_out],
            "StatPosOut": [batch_stat_pos],
            "StatNegOut": [batch_stat_neg],
        },
    )
    # Global AUC
    helper.append_op(
        type="auc",
        inputs={
            "Predict": [input],
            "Label": [label],
            "StatPos": [stat_pos],
            "StatNeg": [stat_neg],
            "InsTagWeight": [ins_tag_weight],
        },
        attrs={
            "curve": curve,
            "num_thresholds": num_thresholds,
            "slide_steps": 0,
        },
        outputs={
            "AUC": [auc_out],
            "StatPosOut": [stat_pos],
            "StatNegOut": [stat_neg],
        },
    )

welsonzhang · 2024-10-12T02:14:42Z

问题2修复方案：遍历ins_tag_weight数据，判断mask是0还是1.
只考虑slide_steps == 0 的情况：

  // when calculate global_auc && is fake data, just do nothing
  //if (slide_steps == 0 && is_fake_data) {
  //  return;
  //}

  statAuc<T>(label,
             input,
             num_thresholds,
             slide_steps,
             origin_stat_pos,
             origin_stat_neg,
             is_fake_data, ins_tag_weight);


template <typename T>
void statAuc(const DenseTensor &label,
             const DenseTensor &predict,
             const int num_thresholds,
             const int slide_steps,
             int64_t *origin_stat_pos,
             int64_t *origin_stat_neg,
             const bool is_fake_data,
             const paddle::optional<DenseTensor> &ins_tag_weight) {
  size_t batch_size = predict.dims()[0];
  size_t inference_width = predict.dims()[1];
  const T *inference_data = predict.data<T>();
  const auto *label_data = label.data<int64_t>();
  const int bucket_length = num_thresholds + 1;
  if (slide_steps == 0) {
    for (size_t i = 0; i < batch_size; i++) {
      // if predict_data[i] has dim of 2, then predict_data[i][1] is pos prob
      // if predict_data[i] has dim of 1, then predict_data[i][0] is pos prob
      if ((ins_tag_weight.get_ptr() != nullptr &&  ins_tag_weight->data<float>()[i] == 0) == false) {
          auto predict_data =
              inference_data[i * inference_width + (inference_width - 1)];
          PADDLE_ENFORCE_LE(predict_data,
                        1,
                        phi::errors::PreconditionNotMet(
                            "The predict data must less or equal 1."));
          PADDLE_ENFORCE_GE(predict_data,
                        0,
                        phi::errors::PreconditionNotMet(
                            "The predict data must gather or equal 0."));

          uint32_t binIdx = static_cast<uint32_t>(predict_data * num_thresholds);
          if (label_data[i] > 0) {
            origin_stat_pos[binIdx] += 1;
          } else if (label_data[i] == 0) {
            origin_stat_neg[binIdx] += 1;
          }
          }
      }
    return;
  }

DesmonDay · 2024-10-16T03:19:55Z

如果确定修复方案ok的话，可以考虑往Paddle代码库提相应的PR，到时候我会找相关同学review。

welsonzhang added status/new-issue 新建 type/bug-report 报bug labels Oct 11, 2024

paddle-bot bot assigned DesmonDay Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paddle.static.auc 中ins_tag_weight 有bug #68640

paddle.static.auc 中ins_tag_weight 有bug #68640

welsonzhang commented Oct 11, 2024

welsonzhang commented Oct 12, 2024

welsonzhang commented Oct 12, 2024

DesmonDay commented Oct 16, 2024

paddle.static.auc 中ins_tag_weight 有bug #68640

paddle.static.auc 中ins_tag_weight 有bug #68640

Comments

welsonzhang commented Oct 11, 2024

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

welsonzhang commented Oct 12, 2024

welsonzhang commented Oct 12, 2024

DesmonDay commented Oct 16, 2024