Skip to content

Latest commit

 

History

History
363 lines (264 loc) · 8.67 KB

api.md

File metadata and controls

363 lines (264 loc) · 8.67 KB

API

Endpoints

Conventions

Model names

Model names follow a model:tag format. Some examples are orca-mini:3b-q4_1 and llama2:70b. The tag is optional and, if not provided, will default to latest. The tag is used to identify a specific version.

Durations

All durations are returned in nanoseconds.

Streaming responses

Certain endpoints stream responses as JSON objects delineated with the newline (\n) character.

Generate a completion

POST /api/generate

Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.

Parameters

  • model: (required) the model name
  • prompt: the prompt to generate a response for

Advanced parameters (optional):

  • options: additional model parameters listed in the documentation for the Modelfile such as temperature
  • system: system prompt to (overrides what is defined in the Modelfile)
  • template: the full prompt or prompt template (overrides what is defined in the Modelfile)
  • context: the context parameter returned from a previous request to /generate, this can be used to keep a short conversational memory
  • stream: if false the response will be be returned as a single response object, rather than a stream of objects

Request

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'

Response

A stream of JSON objects:

{
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
  "done": false
}

The final response in the stream also includes additional data about the generation:

  • total_duration: time spent generating the response
  • load_duration: time spent in nanoseconds loading the model
  • sample_count: number of samples generated
  • sample_duration: time spent generating samples
  • prompt_eval_count: number of tokens in the prompt
  • prompt_eval_duration: time spent in nanoseconds evaluating the prompt
  • eval_count: number of tokens the response
  • eval_duration: time in nanoseconds spent generating the response
  • context: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
  • response: empty if the response was streamed, if not streamed, this will contain the full response

To calculate how fast the response is generated in tokens per second (token/s), divide eval_count / eval_duration.

{
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "",
  "context": [1, 2, 3],
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}

Create a Model

POST /api/create

Create a model from a Modelfile

Parameters

  • name: name of the model to create
  • path: path to the Modelfile
  • stream: (optional) if false the response will be be returned as a single response object, rather than a stream of objects

Request

curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'

Response

A stream of JSON objects. When finished, status is success.

{
  "status": "parsing modelfile"
}

List Local Models

GET /api/tags

List models that are available locally.

Request

curl http://localhost:11434/api/tags

Response

{
  "models": [
    {
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
      "size": 7323310500
    }
  ]
}

Show Model Information

POST /api/show

Show details about a model including modelfile, template, parameters, license, and system prompt.

Parameters

  • name: name of the model to show

Request

curl http://localhost:11434/api/show -d '{
  "name": "llama2:7b"
}'

Response

{
  "license": "<contents of license block>",
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
  "parameters": "stop                           [INST]\nstop                           [/INST]\nstop                           <<SYS>>\nstop                           <</SYS>>",
  "template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
}

Copy a Model

POST /api/copy

Copy a model. Creates a model with another name from an existing model.

Request

curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
}'

Delete a Model

DELETE /api/delete

Delete a model and its data.

Parameters

  • model: model name to delete

Request

curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
}'

Pull a Model

POST /api/pull

Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.

Parameters

  • name: name of the model to pull
  • insecure: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
  • stream: (optional) if false the response will be be returned as a single response object, rather than a stream of objects

Request

curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'

Response

{
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208
}

Push a Model

POST /api/push

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.

Parameters

  • name: name of the model to push in the form of <namespace>/<model>:<tag>
  • insecure: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
  • stream: (optional) if false the response will be be returned as a single response object, rather than a stream of objects

Request

curl -X POST http://localhost:11434/api/push -d '{
  "name": "mattw/pygmalion:latest"
}'

Response

Streaming response that starts with:

{ "status": "retrieving manifest" }

and then:

{
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
}

Then there is a series of uploading responses:

{
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
}

Finally, when the upload is complete:

{"status":"pushing manifest"}
{"status":"success"}

Generate Embeddings

POST /api/embeddings

Generate embeddings from a model

Parameters

  • model: name of model to generate embeddings from
  • prompt: text to generate embeddings for

Advanced parameters:

  • options: additional model parameters listed in the documentation for the Modelfile such as temperature

Request

curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'

Response

{
  "embeddings": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
}