Commit graph

164 commits

Author SHA1 Message Date
Daniel Hiltgen ccd7785859
Merge pull request #5243 from dhiltgen/modelfile_use_mmap
Fix use_mmap for modefiles
2024-07-03 13:59:42 -07:00
royjhan 996bb1b85e
OpenAI: /v1/models and /v1/models/{model} compatibility (#5007)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* OpenAI: /v1/models/{model} compatibility (#5028)

* Retrieve Model

* OpenAI Delete Model

* Retrieve Middleware

* Remove Delete from Branch

* Update Test

* Middleware Test File

* Function name

* Cleanup

* Test Update

* Test Update

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 11:50:56 -07:00
Daniel Hiltgen 97c9e11768 Switch use_mmap to a pointer type
This uses nil as undefined for a cleaner implementation.
2024-07-01 08:44:59 -07:00
Daniel Hiltgen 7e7749224c Fix use_mmap parsing for modelfiles
Add the new tristate parsing logic for the code path for modelfiles,
as well as a unit test.
2024-06-21 12:27:19 -07:00
royjhan fedf71635e
Extend api/show and ollama show to return more model info (#4881)
* API Show Extended

* Initial Draft of Information

Co-Authored-By: Patrick Devine <pdevine@sonic.net>

* Clean Up

* Descriptive arg error messages and other fixes

* Second Draft of Show with Projectors Included

* Remove Chat Template

* Touches

* Prevent wrapping from files

* Verbose functionality

* Docs

* Address Feedback

* Lint

* Resolve Conflicts

* Function Name

* Tests for api/show model info

* Show Test File

* Add Projector Test

* Clean routes

* Projector Check

* Move Show Test

* Touches

* Doc update

---------

Co-authored-by: Patrick Devine <pdevine@sonic.net>
2024-06-19 14:19:02 -07:00
Daniel Hiltgen 171796791f Adjust mmap logic for cuda windows for faster model load
On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off.  This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
2024-06-17 16:54:30 -07:00
royjhan 89c79bec8c
Add ModifiedAt Field to /api/show (#5033)
* Add Mod Time to Show

* Error Handling
2024-06-15 20:53:56 -07:00
Patrick Devine c69bc19e46
move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00
royjhan 4bf1da4944
Separate ListResponse and ModelResponse for api/tags vs api/ps (#4842)
* Remove false time fields

* Struct Separation for List and Process

* Remove Marshaler
2024-06-06 10:11:45 -07:00
Michael Yang e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang c895a7d13f some gocritic 2024-06-04 11:13:30 -07:00
Patrick Devine 6845988807
Ollama ps command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00
Jeffrey Morgan 6602e793c0
Use --quantize flag and quantize api parameter (#4321)
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go

Co-authored-by: Michael Yang <mxyng@pm.me>

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Bruce MacDonald c02db93243 omit empty done reason 2024-05-09 16:45:29 -07:00
Bruce MacDonald cfa84b8470
add done_reason to the api (#4235) 2024-05-09 13:30:14 -07:00
Jeffrey Morgan d5eec16d23
use model defaults for num_gqa, rope_frequency_base and rope_frequency_scale (#1983) 2024-05-09 09:06:13 -07:00
Eli Bendersky d77c1c5f9d
api: fill up API documentation (#3596)
* api: fill up API documentation

Followup for #2878

Now that the documentation is more complete, mention it in the README.

Updates #2840

* fix typo/lint

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 16:27:46 -07:00
Jackie Li af47413dba
Add MarshalJSON to Duration (#3284)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-06 15:59:18 -07:00
Patrick Devine 9009bedf13
better checking for OLLAMA_HOST variable (#3661) 2024-04-29 19:14:07 -04:00
Jeffrey Morgan 993cf8bf55
llm: limit generation to 10x context size to avoid run on generations (#3918)
* llm: limit generation to 10x context size to avoid run on generations

* add comment

* simplify condition statement
2024-04-25 19:02:30 -04:00
Daniel Hiltgen 34b9db5afc Request and model concurrency
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Cheng 62be2050dd
chore: use errors.New to replace fmt.Errorf will much better (#3789) 2024-04-20 22:11:06 -04:00
Eli Bendersky ad90b9ab3d
api: start adding documentation to package api (#2878)
* api: start adding documentation to package api

Updates #2840

* Fix lint typo report
2024-04-10 13:31:55 -04:00
Michael Yang 01114b4526 fix: rope 2024-04-09 16:15:24 -07:00
Michael Yang 9502e5661f cgo quantize 2024-04-08 15:31:08 -07:00
Michael Yang e1c9a2a00f no blob create if already exists 2024-04-08 15:09:48 -07:00
Michael Yang be517e491c no rope parameters 2024-04-05 18:05:27 -07:00
Patrick Devine 1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Patrick Devine 47cfe58af5
Default Keep Alive environment variable (#3094)
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
2024-03-13 13:29:40 -07:00
Jeffrey Morgan 3b4bab3dc5
Fix embeddings load model behavior (#2848) 2024-02-29 17:40:56 -08:00
Ikko Eltociear Ashimine e95b896790
Update types.go (#2744)
specfied -> specified
2024-02-25 13:41:25 -05:00
Michael Yang 897b213468
use http.DefaultClient (#2530)
default client already handles proxy
2024-02-20 18:34:47 -05:00
bnorick caf2b13c10
Fix infinite keep_alive (#2480) 2024-02-13 15:40:32 -08:00
Patrick Devine b5cf31b460
add keep_alive to generate/chat/embedding api endpoints (#2146) 2024-01-26 14:28:02 -08:00
Patrick Devine 7c40a67841
Save and load sessions (#2063) 2024-01-25 12:12:36 -08:00
Michael Yang 745b5934fa add model to ModelResponse 2024-01-18 14:32:55 -08:00
Michael Yang a38d88d828 api: add model for all requests
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Michael Yang 5ffbbea1d7 remove client.py 2024-01-11 15:53:10 -08:00
Patrick Devine 22e93efa41 add show info command and fix the modelfile 2024-01-05 12:20:05 -08:00
Brian Murray 0d6e3565ae
Add embeddings to API (#1773) 2024-01-04 15:00:52 -05:00
Jeffrey Morgan 55978c1dc9 clean up cache api option 2023-12-27 14:27:45 -05:00
Jeffrey Morgan d4ebdadbe7 enable cache_prompt by default 2023-12-27 14:23:42 -05:00
K0IN 10da41d677
Add Cache flag to api (#1642) 2023-12-22 17:16:20 -05:00
Bruce MacDonald d99fa6ce0a
send empty messages on last chat response (#1530) 2023-12-18 14:23:38 -05:00
Patrick Devine d9e60f634b
add image support to the chat api (#1490) 2023-12-12 13:28:58 -08:00
Patrick Devine 910e9401d0
Multimodal support (#1216)
---------

Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
2023-12-11 13:56:22 -08:00
Jeffrey Morgan 9e1406e4ed Don't expose model information in /api/generate 2023-12-09 02:05:43 -08:00
Michael Yang c3ff36088b
Merge pull request #774 from jmorganca/mxyng/server-version
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Michael Yang 5d75505ebd return model configuration in generate 2023-12-05 14:39:02 -08:00
Bruce MacDonald 195e3d9dbd
chat api endpoint (#1392) 2023-12-05 14:57:33 -05:00