You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO 11-08 18:02:50 model_runner.py:1025] Loading model weights took 9.5525 GB
INFO 11-08 18:24:45 gpu_executor.py:122] # GPU blocks: 9151, # CPU blocks: 1170
On GPUs where FP16 performance is slow, I notice a large dely in start up between when model's are loaded and the gpu_executor.py message.
Can someone describe what is happening between these 2 steps? I'm trying to figure out where the slow down takes place and see if this can be speeded up somehow.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:01, 1.67it/s]
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:01<00:00, 1.20it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:03<00:00, 1.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:03<00:00, 1.00s/it]
INFO 11-08 18:02:50 model_runner.py:1025] Loading model weights took 9.5525 GB
INFO 11-08 18:24:45 gpu_executor.py:122] # GPU blocks: 9151, # CPU blocks: 1170
On GPUs where FP16 performance is slow, I notice a large dely in start up between when model's are loaded and the gpu_executor.py message.
Can someone describe what is happening between these 2 steps? I'm trying to figure out where the slow down takes place and see if this can be speeded up somehow.
Beta Was this translation helpful? Give feedback.
All reactions