NVCC error. Compilation of custom CUDA ops on Windows 10 for Tensorflow 2.x #104

uladzislau-varabei · 2021-08-23T23:29:18Z

Hi everyone,

I'm trying to compile custom CUDA ops on Windows 10 for Tensorflow 2.x, however, I encountered a problem.
Below is the output of compilation of fused_bias_act op.

...project/dnnlib/ops/fused_bias_act.cu(204): error: expected an expression

...project/dnnlib/ops/fused_bias_act.cu(204): error: no instance of constructor "tensorflow::register_op::OpDefBuilderWrapper::OpDefBuilderWrapper" matches the argument list
            argument types are: (const char [13], __nv_bool)

...project/dnnlib/ops/fused_bias_act.cu(217): error: expected an expression

...project/dnnlib/ops/fused_bias_act.cu(217): error: expected an expression

...project/dnnlib/ops/fused_bias_act.cu(217): error: expected a type specifier

...project/dnnlib/ops/fused_bias_act.cu(217): error: expected an expression

...project/dnnlib/ops/fused_bias_act.cu(218): error: expected an expression

...project/dnnlib/ops/fused_bias_act.cu(218): error: expected an expression

...project/dnnlib/ops/fused_bias_act.cu(218): error: expected a type specifier

...project/dnnlib/ops/fused_bias_act.cu(218): error: expected an expression

10 errors detected in the compilation of "...project/dnnlib/ops/fused_bias_act.cu".
_pywrap_tensorflow_internal.lib
fused_bias_act.cu

This is how these lines look in script (taken from the repo):

(204) REGISTER_OP("FusedBiasAct")
   ...
    .Attr       ("clamp: float = -1.0");
(217) REGISTER_KERNEL_BUILDER(Name("FusedBiasAct").Device(DEVICE_GPU).TypeConstraint<float>("T"), FusedBiasActOp<float>);
(218) REGISTER_KERNEL_BUILDER(Name("FusedBiasAct").Device(DEVICE_GPU).TypeConstraint<Eigen::half>("T"), FusedBiasActOp<Eigen::half>);

It's the same as sescribed by mavanmanen here.

I tried several conda environments and here are the results:

Successfully compiled with Tf 1.14 (pip) + cuda 10.0 (conda) + cudnn 7.6.5 (conda) + MSVC 14.16 (VS17)
Didn't compile with Tf 2.6 (pip) + cuda 10.2 (conda) + cudnn 7.6.5 (conda) + MSVC 14.16 (VS17) / 14.29 (VS19)
Didn't compile with Tf 2.5 (pip) + cuda 11.2 (conda) + cudnn 8.1.0 (conda) + MSVC 14.16 (VS17) / 14.29 (VS19)
Didn't compile with Tf 2.5 (pip) + cuda 11.2 (system) + cudnn 8.1.0 (system) + MSVC 14.16 (VS17) / 14.29 (VS19)

As you can see the problem seems to be related to Tf 2.x (see option 1). I thought maybe it had something to do with cuda/cudnn, but trying different versions didn't help (see options 2 and 3). I also thought that maybe something isn't installed when using conda channels, but again the result is the same (see options 3 and 4). I tried to use Tf v1 mode, but it also didn't provide result. Code for this:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

I noticed that most of Tf 2.x ports of StyleGAN2/StyleGAN2-ADA (both projects have these custom ops) use different flags for compilation on Linux. Example 1 and 2.
The changes are aligned with the offical Tf 2.x guide for custom ops. However, none of the ports I found changes anything for Windows (except for the path to MSVC). The guide also only provides details for Linux.

I have found an official repo with details for both: Linux and Windows, but it didn't help me a lot. It has a potentially very useful BAZEL build file, which provides flags/options for Windows, but, unfortunately, still I couldn't compile the op. I tried to explicitly add flags to the script from this repo (I mean StyleGAN2-ADA), but I only had errors saying that some of them are not recognized. Note: I still tried to compile them with MSVC, not with Bazel.

So, if anyone could help wih compiling these ops on Windows with Tensorflow 2.x, it would be great. Pieces of code (explicit compilation flags, etc.), ideas, explanations and just thoughts are welcome.

Thanks in advance

The text was updated successfully, but these errors were encountered:

johndpope · 2021-08-23T23:45:52Z

try https://github.com/johndpope/stylegan2-ada/
or this branch
https://github.com/johndpope/stylegan2-ada/tree/digressions

tensorflow is dead to nvidia labs - gotta move to pytorch.
https://github.com/NVlabs/stylegan2-ada-pytorch

uladzislau-varabei · 2021-08-24T00:59:21Z

@johndpope the links you suggested don't seem to have any changes for Windows compile flags, only Linux. Though I tried it, yet ops are still not compiled and the error is the same. Just in case you didn't notice, I mentioned that I had tried to disable v2 behaviour (one of the main changes in shared links), but no success.

I know that there is an official PyTorch port and all upcoming projects by NVlabs will use it, yet I still would like to compile the ops on Windows and Tf 2.x.

johndpope · 2021-08-24T01:22:03Z

I had some problems with fused ops on Linux / one of the problems was gcc version. When os updated / version bumped to 10.3 (broken) 10.2 was working fine and had similar error. Had to link nvcc to Gcc 9 vs downgrading system Gcc.

NVIDIA/nccl#494

uladzislau-varabei · 2021-08-24T21:36:57Z

Interesting. On Windows I use Visual Studio and MSVC (suggested by NVlabs and works for Tf 1.14). Actually, one of my guesses was that some versions of MSVC are not compatiable with Tf 2.x (except the old ones), so I tried using MSVC 14.16 (VS 2017) and MSVC 14.29 (VS 2019), but none of them worked for Tf 2.x. I'm not sure if it's a compiler version, some missing components of VS (though again it works for Tf 1.14), Visual Studio version or compilation options (most likely this one in my opinion).

whl0070179 · 2022-05-19T16:40:25Z

I has the same problem to build 'upfirdn_2d.cu'.
nvcc --std=c++11 -DNDEBUG "C:\Program Files\Python39\lib\site-packages\tensorflow\python_pywrap_tensorflow_internal.lib" --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "C:\Program Files\Python39\lib\site-packages\tensorflow\include" --include-path "C:\Program Files\Python39\lib\site-packages\tensorflow\include\external\protobuf_archive\src" --include-path "C:\Program Files\Python39\lib\site-packages\tensorflow\include\external\com_google_absl" --include-path "C:\Program Files\Python39\lib\site-packages\tensorflow\include\external\eigen_archive" --compiler-bindir "C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.16.27023/bin/HostX64/x64" 2>&1 "D:\notebook_root\StyleGAN2-TensorFlow-2.x-master\dnnlib\ops\upfirdn_2d.cu" --shared -o "C:\Users\ADMINI~~1\AppData\Local\Temp\tmp5005yz24\upfirdn_2d_tmp.dll" --keep --keep-dir "C:\Users\ADMINI~~1\AppData\Local\Temp\tmp5005yz24"

D:/notebook_root/StyleGAN2-TensorFlow-2.x-master/dnnlib/ops/upfirdn_2d.cu(310): error: expected an expression

D:/notebook_root/StyleGAN2-TensorFlow-2.x-master/dnnlib/ops/upfirdn_2d.cu(310): error: no instance of constructor "tensorflow::register_op::OpDefBuilderWrapper::OpDefBuilderWrapper" matches the argument list
argument types are: (const char [10], __nv_bool)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVCC error. Compilation of custom CUDA ops on Windows 10 for Tensorflow 2.x #104

NVCC error. Compilation of custom CUDA ops on Windows 10 for Tensorflow 2.x #104

uladzislau-varabei commented Aug 23, 2021 •

edited

Loading

johndpope commented Aug 23, 2021

uladzislau-varabei commented Aug 24, 2021

johndpope commented Aug 24, 2021 •

edited

Loading

uladzislau-varabei commented Aug 24, 2021

whl0070179 commented May 19, 2022

NVCC error. Compilation of custom CUDA ops on Windows 10 for Tensorflow 2.x #104

NVCC error. Compilation of custom CUDA ops on Windows 10 for Tensorflow 2.x #104

Comments

uladzislau-varabei commented Aug 23, 2021 • edited Loading

johndpope commented Aug 23, 2021

uladzislau-varabei commented Aug 24, 2021

johndpope commented Aug 24, 2021 • edited Loading

uladzislau-varabei commented Aug 24, 2021

whl0070179 commented May 19, 2022

uladzislau-varabei commented Aug 23, 2021 •

edited

Loading

johndpope commented Aug 24, 2021 •

edited

Loading