Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to compile paddle 3.0.0.beta0, Assertion `idx < size()' failed. #68763

Open
3 tasks done
xuesu opened this issue Oct 16, 2024 · 7 comments
Open
3 tasks done

Unable to compile paddle 3.0.0.beta0, Assertion `idx < size()' failed. #68763

xuesu opened this issue Oct 16, 2024 · 7 comments
Assignees
Labels
status/new-issue 新建 type/build 编译/安装问题

Comments

@xuesu
Copy link

xuesu commented Oct 16, 2024

问题描述 Issue Description

🔎 Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.
  • I have searched the PaddleOCR Issues and found no similar bug report.
  • I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

I got the following error:

eager_generator: /home/iris/CDeepFuzz/Paddle/paddle/utils/small_vector.h:343: T& paddle::small_vector_template_common<T, <template-parameter-1-2> >::at(paddle::small_vector_template_common<T, <template-parameter-1-2> >::size_type) [with T = phi::TensorArgDef; <template-parameter-1-2> = void; paddle::small_vector_template_common<T, <template-parameter-1-2> >::reference = phi::TensorArgDef&; paddle::small_vector_template_common<T, <template-parameter-1-2> >::size_type = long unsigned int]: Assertion `idx < size()' failed.
Subprocess aborted
gmake[2]: *** [paddle/fluid/eager/auto_code_generator/CMakeFiles/legacy_eager_codegen.dir/build.make:70: paddle/fluid/eager/auto_code_generator/CMakeFiles/legacy_eager_codegen] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:66694: paddle/fluid/eager/auto_code_generator/CMakeFiles/legacy_eager_codegen.dir/all] Error 2

I added some print logs at :

  reference at(size_type idx) {
    std::string prompt = "LALALA: ";
    prompt += std::to_string(idx) + "," + std::to_string(size()); 
    std::cout << prompt << std::endl;
    assert(idx < size());
    return begin()[idx];
  }

I got:

LALALA: 1, 1

so the idx(1) is equal to size()(1).

I added traceback at:

@@ -333,6 +341,12 @@ class small_vector_template_common
   }
 
   reference at(size_type idx) {
+    if(idx == size()){
+      void *buffer[100];
+      int nptrs = backtrace(buffer, 100);  // Capture up to 100 frames
+      std::cerr << "Stack trace:\n";
+      backtrace_symbols_fd(buffer, nptrs, STDERR_FILENO);  // Print the stack trace
+    }
     assert(idx < size());
     return begin()

I got:

Stack trace:
~/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator(_ZN6paddle28small_vector_template_commonIN3phi12TensorArgDefEvE2atEm+0x5d)[0x5aaefee7d309]
~/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator(_ZN3phi6Kernel7InputAtEm+0x36)[0x5aaefee75700]
~/Paddle/build/paddle/phi/libphi_kernel_gpu.so(+0x4249c8c)[0x74579bc49c8c]
~/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator(_ZN3phi15KernelRegistrar15ConstructKernelENS_7RegTypeEPKcS3_N6common10DataLayoutENS_8DataTypeEPFvRKNS_9KernelKeyEPNS_13KernelArgsDefEEPFvS9_PNS_6KernelEESt8functionIFvPNS_13KernelContextEEEPv+0x1ca)[0x5aaefee3da7a]
~/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator(_ZN3phi15KernelRegistrarC1ENS_7RegTypeEPKcS3_N6common10DataLayoutENS_8DataTypeEPFvRKNS_9KernelKeyEPNS_13KernelArgsDefEEPFvS9_PNS_6KernelEESt8functionIFvPNS_13KernelContextEEEPv+0x9f)[0x5aaefee75839]
~/Paddle/build/paddle/phi/libphi_kernel_gpu.so(+0x424ab15)[0x74579bc4ab15]
~/Paddle/build/paddle/phi/libphi_kernel_gpu.so(+0x424addd)[0x74579bc4addd]
/lib64/ld-linux-x86-64.so.2(+0x647e)[0x7457cee2947e]
/lib64/ld-linux-x86-64.so.2(+0x6568)[0x7457cee29568]
/lib64/ld-linux-x86-64.so.2(+0x202ca)[0x7457cee432ca]

I wonder if this is because all source files under the folder paddle/fluid/eager/api/generated/fluid_generated/forwards/(e.g.: dygraph_forward_functions3.cc), are empty, but this function(https://github.com/jiaoxuewu/PaddleBox/blob/7552ba29f6b729f3192b4747283770b254433c8b/paddle/fluid/eager/auto_code_generator/generate_file_structures.py#L98) suggests that those files should be empty: GenerateFileStructureForIntermediateDygraph....

Sorry for writing in English...

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

cmake ..   -DWITH_GPU=ON   -DWITH_TESTING=ON   -DWITH_DISTRIBUTE=ON   -DCMAKE_BUILD_TYPE=Debug   -DWITH_MKL=ON   -DWITH_PYTHON=ON -DCMAKE_C_COMPILER=clang  -DCMAKE_CXX_COMPILER=clang++
cmake --build . -j 1

or

cd ~/Paddle/build/paddle/fluid/eager/auto_code_generator &&  ~/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator ~/Paddle/paddle/fluid/eager/api/generated/fluid_generated 8

版本&环境信息 Version & Environment Information

🏃‍♂️ Environment (运行环境)

OS: ubuntu 22.04
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 17.0.6 (https://github.com/llvm/llvm-project.git 6009708b4367171ccdbf4b5905cb6a803753fe18)
CMake version: version 3.22.1
Libc version: glibc 2.35
Python version: 3.10.15

CUDA version: 12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
cuDNN version: 9.4.0
Nvidia driver version: 560.35.03
Nvidia driver List:
GPU 0: NVIDIA GeForce RTX 4090
GCC: gcc 11
Clang: 17.0.6 (tried both GCC and Clang)
Memory: 64GB
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i9-14900KF
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
Stepping: 1
CPU max MHz: 6000.0000
CPU min MHz: 800.0000
BogoMIPS: 6374.40

@xuesu xuesu added status/new-issue 新建 type/build 编译/安装问题 labels Oct 16, 2024
@risemeup1
Copy link
Contributor

paddle单线程编译make -j1一直都有问题,编译不过

@risemeup1
Copy link
Contributor

我们本地也在复现

@xuesu
Copy link
Author

xuesu commented Oct 17, 2024

非常感谢!贵司是我看到的回复最即时的类似库!其实-j50也是报一样的错误

@xuesu
Copy link
Author

xuesu commented Oct 17, 2024

有点不好意思,但是 TensorArgDef OutputAt(size_t idx) { return args_def().input_defs()[idx]; }这是刻意这么写的么?我看其他到的头文件没有这么写呀。。。https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/capi/include/wrapper_base.h ln553

@xuesu
Copy link
Author

xuesu commented Oct 17, 2024

出问题的kernel是

PD_REGISTER_KERNEL(eigvalsh,  // cuda_only
                   GPU,
                   ALL_LAYOUT,
                   phi::EigvalshKernel,
                   float,
                   double,
                   phi::dtype::complex<float>,
                   phi::dtype::complex<double>) {
  kernel->InputAt(1).SetDataType(phi::dtype::ToReal(kernel_key.dtype()));

这里input只有1个,但是却要求第1个(也就是第二个)的input datatype为REAL。那么这里到底是

  1. 要求input应当至少有2个
  2. 第0个input data type为REAL(我的猜测)
    • 因为这个如果和np.linalg.eigvalsh相同的话,那应该只有一个input才对。如果意图是把整数矩阵转化为浮点数矩阵那也有可能
    • forward : eigvalsh (Tensor x, str uplo = "L", bool is_test = false) -> Tensor(eigenvalues), Tensor(eigenvectors)
    • 可是CPU或者GPU似乎都有很多只有一个x作为输入,但是却要求 kernel->InputAt(1).SetDataType(phi::dtype::ToReal(kernel_key.dtype()));
  3. 在特殊语意中对应的input强制为REAL
  4. at的语义发生了变化
  5. 是因为cuda12.4不兼容?但是还在register kernel中?

@xuesu
Copy link
Author

xuesu commented Oct 17, 2024

>python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
>python -m unittest test_eigvalsh_op.py
Illegal instruction (core dumped)

@xuesu
Copy link
Author

xuesu commented Oct 17, 2024

我关掉了-DWITH_TESTING,错误不变。另外我无法用-DWITH_TESTING来编译该库。
我使用了kernel->InputAt(0).SetDataType(phi::dtype::ToReal(kernel_key.dtype()));编译成功

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/new-issue 新建 type/build 编译/安装问题
Projects
None yet
Development

No branches or pull requests

3 participants