ollama/docs/openai.md

3 KiB

OpenAI compatibility

Ollama provides experimental compatibility with parts of the OpenAI API to help connect existing applications to Ollama.

Note: OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama Python library, JavaScript library and REST API.

Usage

OpenAI Python library

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',

    # required but ignored
    api_key='ollama',
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama2',
)

OpenAI JavaScript library

import OpenAI from 'openai'

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1/',

  // required but ignored
  apiKey: 'ollama',
})

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: 'user', content: 'Say this is a test' }],
  model: 'llama2',
})

curl

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Endpoints

/v1/chat/completions

Supported features

  • Chat completions
  • Streaming
  • JSON mode
  • Reproducible outputs
  • Vision
  • Function calling
  • Logprobs

Supported request fields

  • model
  • messages
    • Text content
    • Array of content parts
  • frequency_penalty
  • presence_penalty
  • response_format
  • seed
  • stop
  • stream
  • temperature
  • top_p
  • max_tokens
  • logit_bias
  • tools
  • tool_choice
  • user

Notes

  • Setting seed will always set temperature to 0
  • finish_reason will always be stop
  • usage.prompt_tokens will be 0 for completions where prompt evaluation is cached

Models

Before using a model, pull it locally ollama pull:

ollama pull llama2

Default model names

For tooling that relies on default OpenAI model names such as gpt-3.5-turbo, use ollama cp to copy an existing model name to a temporary name:

ollama cp llama2 gpt-3.5-turbo

Afterwards, this new model name can be specified the model field:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gpt-3.5-turbo",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'