b3463

Latest

github-actions released this 14 Aug 04:41

443665a

llama : do not request buffer type if we don't need it anyway

Since we use ngl=0 with the Kompute backend to load models on CPU on
Linux and Windows, we need to make sure not to call
ggml_backend_kompute_buffer_type, which initializes the Vulkan driver.

Initializing the Vulkan driver in this case could cause a failure for no
good reason (e.g. if it is not available).

Also, when we do not create any Kompute buffers, the instance currently
does not have an opportunity to be freed until exit-time destructors
run, at which point the necessary libraries may have already been
unloaded from memory. This causes an observable segfault at exit when
loading the model on CPU via the Python bindings.

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3463