Commit graph

234 commits

Author SHA1 Message Date
Jeffrey Morgan 9b12a511ca check other request fields before load short circuit in /api/generate 2023-09-22 23:50:55 -04:00
Bruce MacDonald 5d71bda478
close llm on interrupt (#577) 2023-09-22 19:41:52 +01:00
Michael Yang 82f5b66c01 register HEAD /api/tags 2023-09-21 16:38:03 -07:00
Michael Yang c986694367 fix HEAD / request
HEAD request should respond like their GET counterparts except without a
response body.
2023-09-21 16:35:58 -07:00
Bruce MacDonald 4cba75efc5
remove tmp directories created by previous servers (#559)
* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Michael Yang 1fabba474b refactor default allow origins
this should be less error prone
2023-09-21 09:42:25 -07:00
Michael Yang ee4fd16f2c
Merge pull request #556 from jmorganca/pack-cuda
pack in cuda libs
2023-09-20 15:02:36 -07:00
Bruce MacDonald 1255bc9b45 only package 11.8 runner 2023-09-20 20:00:41 +01:00
Michael Yang 499e9007a5 pick chunksize based on location 2023-09-20 11:10:24 -07:00
Michael Yang aa45d7c1df draft: explicitly follow upload redirects 2023-09-19 13:36:58 -07:00
Michael Yang a5520bfb42 fix build 2023-09-19 10:42:24 -07:00
Michael Yang b58d5d16b0 fix mkdir on windows 2023-09-19 09:41:13 -07:00
Patrick Devine 24580df958
only add a layer if there is actual data (#535) 2023-09-18 13:47:45 -07:00
Patrick Devine 80dd44e80a
Cmd changes (#541) 2023-09-18 12:26:56 -07:00
Michael Yang 08d7c2a944 fix error on upload chunk 2023-09-15 15:59:30 -07:00
Michael Yang e53bc57d4d split uploadBlobChunked 2023-09-14 17:22:05 -07:00
Michael Yang f0b398d17f implement ProgressWriter 2023-09-14 17:22:04 -07:00
Michael Yang daa4f096f9 set request.ContentLength
This informs the HTTP client the content length is known and disables
chunked Transfer-Encoding
2023-09-14 13:32:44 -07:00
Michael Yang e6881cabd0 remove unused 2023-09-13 14:48:33 -07:00
Michael Yang 0c5a454361 fix model type for 70b 2023-09-12 15:12:59 -07:00
Michael Yang 7dee25a07f fix falcon decode
get model and file type from bin file
2023-09-12 12:34:53 -07:00
Bruce MacDonald f221637053
first pass at linux gpu support (#454)
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-12 11:04:35 -04:00
Patrick Devine 45ac07cd02
create the blobs directory correctly (#508) 2023-09-11 14:54:52 -07:00
Patrick Devine e7e91cd71c
add autoprune to remove unused layers (#491) 2023-09-11 11:46:35 -07:00
Jeffrey Morgan 3920e15386
add model format to config layer (#497) 2023-09-09 17:53:44 -04:00
Michael Yang de227b620f fix nil pointer dereference 2023-09-07 17:24:31 -07:00
Michael Yang 738fe9c4aa
Merge pull request #486 from jmorganca/mxyng/fix-push
fix: retry push on expired token
2023-09-07 13:58:34 -07:00
Michael Yang bf146fb072 fix retry on unauthorized chunk 2023-09-07 12:02:04 -07:00
Michael Yang f0f4943577 fix get auth token 2023-09-07 12:01:56 -07:00
Bruce MacDonald 09dd2aeff9
GGUF support (#441) 2023-09-07 13:55:37 -04:00
Michael Yang 83c6be1666
fix model manifests (#477) 2023-09-06 17:30:08 -04:00
Patrick Devine 790d24eb7b
add show command (#474) 2023-09-06 11:04:17 -07:00
Michael Yang a1ecdd36d5 create manifests directory 2023-09-05 17:10:40 -07:00
Michael Yang d1c2558f7e
Merge pull request #461 from jmorganca/mxyng/fix-inherit-params
fix inherit params
2023-09-05 12:30:23 -07:00
Michael Yang 06ef90c051 fix parameter inheritence
parameters are not inherited because they are processed differently from
other layer. fix this by explicitly merging the inherited params into
the new params. parameter values defined in the new modelfile will
override those defined in the inherited modelfile. array lists are
replaced instead of appended
2023-09-05 11:40:20 -07:00
Michael Yang e9f6df7dca use slices.DeleteFunc 2023-09-05 09:56:59 -07:00
Michael Yang 681f3c4c42 fix num_keep 2023-09-03 17:47:49 -04:00
Quinn Slack 62d29b2157 do not HTML-escape prompt
The `html/template` package automatically HTML-escapes interpolated strings in templates. This behavior is undesirable because it causes prompts like `<h1>hello` to be escaped to `&lt;h1&gt;hello` before being passed to the LLM.

The included test case passes, but before the code change, it failed:

```
--- FAIL: TestModelPrompt
    images_test.go:21: got "a&lt;h1&gt;b", want "a<h1>b"
```
2023-09-01 17:16:38 -05:00
Michael Yang 1c8fd627ad windows: fix create modelfile 2023-08-31 09:47:10 -04:00
Michael Yang ae950b00f1 windows: fix delete 2023-08-31 09:47:10 -04:00
Michael Yang eeb40a672c fix list models for windows 2023-08-31 09:47:10 -04:00
Michael Yang 0f541a0367 s/ListResponseModel/ModelResponse/ 2023-08-31 09:47:10 -04:00
Bruce MacDonald 42998d797d
subprocess llama.cpp server (#401)
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Quinn Slack f4432e1dba
treat stop as stop sequences, not exact tokens (#442)
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.
2023-08-30 11:53:42 -04:00
Michael Yang 982c535428
Merge pull request #428 from jmorganca/mxyng/upload-chunks
update upload chunks
2023-08-30 07:47:17 -07:00
Patrick Devine 8bbff2df98
add model IDs (#439) 2023-08-28 20:50:24 -07:00
Michael Yang 16b06699fd remove unused parameter 2023-08-28 18:35:18 -04:00
Michael Yang 246dc65417 loosen http status code checks 2023-08-28 18:34:53 -04:00
Michael Yang 865fceb73c chunked pipe 2023-08-28 18:34:53 -04:00
Michael Yang 72266c7684 bump chunk size to 95MB 2023-08-28 18:34:53 -04:00