Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upscale performance feels really slow? #3024

Open
some9000 opened this issue Sep 15, 2024 · 8 comments
Open

Upscale performance feels really slow? #3024

some9000 opened this issue Sep 15, 2024 · 8 comments

Comments

@some9000
Copy link

Hello!

This app is amazing, but I just want to make sure something isn't being done incorrectly on my part.

Basically using it with the upscale functions (but mostly with x1 models to improve image quality) seems to take quite a while, several minutes most of the time. Meanwhile something like upscayl appears to blast through similar tasks within seconds. Obviously models can differ, but still such a massive difference makes one think.

Here's system information from the app:

{
  "app": {
    "version": "0.24.1",
    "packaged": true,
    "path": "C:\\Users\\XXX\\AppData\\Local\\chaiNNer\\app-0.24.1\\resources\\app"
  },

  "os": {
    "version": "Windows 10 Pro",
    "release": "10.0.22631",
    "arch": "x64",
    "endianness": "LE"
  },
  "cpu": {
    "manufacturer": "AMD",
    "brand": "Ryzen Threadripper 2950X 16-Core Processor",
    "vendor": "AuthenticAMD",
    "family": "23",
    "model": "8",
    "stepping": "2",
    "revision": "2050",
    "voltage": "",
    "speed": 3.5,
    "speedMin": 3.5,
    "speedMax": 3.5,
    "governor": "",
    "cores": 32,
    "physicalCores": 16,
    "performanceCores": 32,
    "efficiencyCores": 0,
    "processors": 1,
    "socket": "SP3r2",
    "flags": "de pse tsc msr sep mtrr mca cmov psn clfsh ds mmx fxsr sse sse2 ss htt tm ia64 pbe",
    "virtualization": false,
    "cache": {
      "l1d": 768,
      "l1i": 768,
      "l2": 8388608,
      "l3": 33554432
    }
  },
  "gpus": [
    {
      "vendor": "NVIDIA",
      "model": "NVIDIA GeForce RTX 2070",
      "bus": "PCI",
      "vram": 8192,
      "vramDynamic": false,
      "subDeviceId": "0x37AD1458",
      "driverVersion": "561.09",
      "name": "NVIDIA GeForce RTX 2070",
      "pciBus": "00000000:41:00.0",
      "fanSpeed": 81,
      "memoryTotal": 8192,
      "memoryUsed": 5146,
      "memoryFree": 2861,
      "utilizationGpu": 100,
      "utilizationMemory": 23,
      "temperatureGpu": 80,
      "powerDraw": 192.07,
      "powerLimit": 215,
      "clockCore": 1860,
      "clockMemory": 6801
    }
  ],
  "settings": {
    "useSystemPython": false,
    "systemPythonLocation": "",
    "theme": "default-dark",
    "checkForUpdatesOnStartup": true,
    "startupTemplate": "",
    "animateChain": true,
    "snapToGrid": false,
    "snapToGridAmount": 16,
    "viewportExportPadding": 20,
    "showMinimap": false,
    "experimentalFeatures": false,
    "hardwareAcceleration": false,
    "allowMultipleInstances": false,
    "lastWindowSize": {
      "maximized": true,
      "width": 1278,
      "height": 680
    },
    "favoriteNodes": [],
    "packageSettings": {
      "chaiNNer_pytorch": {
        "gpu_index": "0",
        "use_cpu": false,
        "use_fp16": true,
        "budget_limit": 0,
        "force_cache_wipe": false
      },
      "chaiNNer_ncnn": {
        "gpu_index": "0",
        "budget_limit": 0
      },
      "chaiNNer_onnx": {
        "gpu_index": "0",
        "execution_provider": "CUDAExecutionProvider",
        "onnx_tensorrt_cache": "",
        "tensorrt_fp16_mode": true
      }
    },
    "storage": {
      "lastDirectories": {
      },
      "nodeSelectorCollapsed": true,
      "recent": [
      ]
    }
  }
}
@joeyballentine
Copy link
Member

try using a small-ish custom tile size when upscaling and see if that helps. the auto mode might be estimating a tile size that is slightly too large

@some9000
Copy link
Author

Thanks for the reply. I tested different Tile Sizes and seems like Auto is estimating just fine, it did not jump into "not enough VRAM" territory. Here are the values just in case:

Size / Time
256 / 79s
384 / 72s
512 / 69s
768 / 67s <== Seems like Auto estimate took this, same value
1024 / 66s
2048 / 2m 13s <== Ran out of VRAM

Guess everything works the way it is supposed to, it's just the user being impatient, heh. Well, just wanted to make sure everything is being done correctly on this end and looks like it is.

@joeyballentine
Copy link
Member

You're doing this via the pytorch nodes, right?

@some9000
Copy link
Author

Yes, here is an example. A, theoretically, simple upscale took 4m 52s

image

(These screen capture options are amazing, btw)

@joeyballentine
Copy link
Member

That's a really large model you're running on a very large image. It's going to be slow no matter what

@some9000
Copy link
Author

That's a really large model you're running on a very large image. It's going to be slow no matter what

I see. Guess upscayl is just doing something differently (sneakily using smaller models or such) and has spoiled my expectations. Thank you for your patience.

Btw, would it be hard to add an option for something like a little beep (or just regular OS notification noise, whatever it may be) when a chain finishes?

@pokepress
Copy link

pokepress commented Sep 20, 2024

Since you seem to be on Windows, you should be able to use task manager' performance tab to verify your GPU is being used as expected. It probably is, but it can't hurt to check. You should see the usage show up on the Cuda graph

@JeremyRand
Copy link
Contributor

I haven't used Upscayl before, but from glancing at its documentation, it looks like it's using ncnn rather than PyTorch. Can you compare the performance of chaiNNer's ncnn nodes with Upscayl, for a more apples-to-apples comparison?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants