This repo contains Docker containers that can be used to deploy ggml-based inference endpoints at:
https://ui.endpoints.huggingface.co
During dedicated endpoint creation, select custom container type like this:
note: the LLAMACPP_ARGS
environment variable is a temporary mechanism to pass custom arguments to llama-server