- Generate a completion
- Create a Model
- List Local Models
- Show Model Information
- Copy a Model
- Delete a Model
- Pull a Model
- Push a Model
- Generate Embeddings
Model names follow a model:tag
format. Some examples are orca-mini:3b-q4_1
and llama2:70b
. The tag is optional and, if not provided, will default to latest
. The tag is used to identify a specific version.
All durations are returned in nanoseconds.
Certain endpoints stream responses as JSON objects delineated with the newline (\n
) character.
POST /api/generate
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
model
: (required) the model nameprompt
: the prompt to generate a response for
Advanced parameters (optional):
options
: additional model parameters listed in the documentation for the Modelfile such astemperature
system
: system prompt to (overrides what is defined in theModelfile
)template
: the full prompt or prompt template (overrides what is defined in theModelfile
)context
: the context parameter returned from a previous request to/generate
, this can be used to keep a short conversational memorystream
: iffalse
the response will be be returned as a single response object, rather than a stream of objects
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2:7b",
"prompt": "Why is the sky blue?"
}'
A stream of JSON objects:
{
"model": "llama2:7b",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
The final response in the stream also includes additional data about the generation:
total_duration
: time spent generating the responseload_duration
: time spent in nanoseconds loading the modelsample_count
: number of samples generatedsample_duration
: time spent generating samplesprompt_eval_count
: number of tokens in the promptprompt_eval_duration
: time spent in nanoseconds evaluating the prompteval_count
: number of tokens the responseeval_duration
: time in nanoseconds spent generating the responsecontext
: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memoryresponse
: empty if the response was streamed, if not streamed, this will contain the full response
To calculate how fast the response is generated in tokens per second (token/s), divide eval_count
/ eval_duration
.
{
"model": "llama2:7b",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"context": [1, 2, 3],
"done": true,
"total_duration": 5589157167,
"load_duration": 3013701500,
"sample_count": 114,
"sample_duration": 81442000,
"prompt_eval_count": 46,
"prompt_eval_duration": 1160282000,
"eval_count": 113,
"eval_duration": 1325948000
}
POST /api/create
Create a model from a Modelfile
name
: name of the model to createpath
: path to the Modelfilestream
: (optional) iffalse
the response will be be returned as a single response object, rather than a stream of objects
curl -X POST http://localhost:11434/api/create -d '{
"name": "mario",
"path": "~/Modelfile"
}'
A stream of JSON objects. When finished, status
is success
.
{
"status": "parsing modelfile"
}
GET /api/tags
List models that are available locally.
curl http://localhost:11434/api/tags
{
"models": [
{
"name": "llama2:7b",
"modified_at": "2023-08-02T17:02:23.713454393-07:00",
"size": 3791730596
},
{
"name": "llama2:13b",
"modified_at": "2023-08-08T12:08:38.093596297-07:00",
"size": 7323310500
}
]
}
POST /api/show
Show details about a model including modelfile, template, parameters, license, and system prompt.
name
: name of the model to show
curl http://localhost:11434/api/show -d '{
"name": "llama2:7b"
}'
{
"license": "<contents of license block>",
"modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
"parameters": "stop [INST]\nstop [/INST]\nstop <<SYS>>\nstop <</SYS>>",
"template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
}
POST /api/copy
Copy a model. Creates a model with another name from an existing model.
curl http://localhost:11434/api/copy -d '{
"source": "llama2:7b",
"destination": "llama2-backup"
}'
DELETE /api/delete
Delete a model and its data.
model
: model name to delete
curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama2:13b"
}'
POST /api/pull
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
name
: name of the model to pullinsecure
: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.stream
: (optional) iffalse
the response will be be returned as a single response object, rather than a stream of objects
curl -X POST http://localhost:11434/api/pull -d '{
"name": "llama2:7b"
}'
{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208
}
POST /api/push
Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.
name
: name of the model to push in the form of<namespace>/<model>:<tag>
insecure
: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.stream
: (optional) iffalse
the response will be be returned as a single response object, rather than a stream of objects
curl -X POST http://localhost:11434/api/push -d '{
"name": "mattw/pygmalion:latest"
}'
Streaming response that starts with:
{ "status": "retrieving manifest" }
and then:
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
Then there is a series of uploading responses:
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
Finally, when the upload is complete:
{"status":"pushing manifest"}
{"status":"success"}
POST /api/embeddings
Generate embeddings from a model
model
: name of model to generate embeddings fromprompt
: text to generate embeddings for
Advanced parameters:
options
: additional model parameters listed in the documentation for the Modelfile such astemperature
curl -X POST http://localhost:11434/api/embeddings -d '{
"model": "llama2:7b",
"prompt": "Here is an article about llamas..."
}'
{
"embeddings": [
0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
]
}