GitHub - ggml-org/hf-inference-endpoints

Inference endpoints for Hugging Face

This repo contains Docker containers that can be used to deploy ggml-based inference endpoints at:

During dedicated endpoint creation, select custom container type like this:

note: the LLAMACPP_ARGS environment variable is a temporary mechanism to pass custom arguments to llama-server

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
llama.cpp/cuda-default		llama.cpp/cuda-default
README.md		README.md