What happens after model_running.py loading model weights? #10175

cduk · 2024-11-09T02:27:31Z

cduk
Nov 9, 2024

Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:01, 1.67it/s]
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:01<00:00, 1.20it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:03<00:00, 1.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:03<00:00, 1.00s/it]

INFO 11-08 18:02:50 model_runner.py:1025] Loading model weights took 9.5525 GB
INFO 11-08 18:24:45 gpu_executor.py:122] # GPU blocks: 9151, # CPU blocks: 1170

On GPUs where FP16 performance is slow, I notice a large dely in start up between when model's are loaded and the gpu_executor.py message.

Can someone describe what is happening between these 2 steps? I'm trying to figure out where the slow down takes place and see if this can be speeded up somehow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens after model_running.py loading model weights? #10175

{{title}}

Replies: 0 comments

Select a reply

What happens after model_running.py loading model weights? #10175

cduk Nov 9, 2024

Replies: 0 comments

cduk
Nov 9, 2024