Commit graph

846 commits

Author SHA1 Message Date
Michael Yang 0f541a0367 s/ListResponseModel/ModelResponse/ 2023-08-31 09:47:10 -04:00
Jeffrey Morgan bc3e21fdc6 update README.md 2023-08-30 17:56:14 -04:00
Jeffrey Morgan a82eb275ff update docs for subprocess 2023-08-30 17:54:02 -04:00
Bruce MacDonald f964aea9a2 remove test not applicate to subprocess 2023-08-30 16:36:11 -04:00
Bruce MacDonald 42998d797d
subprocess llama.cpp server (#401)
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Quinn Slack f4432e1dba
treat stop as stop sequences, not exact tokens (#442)
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.
2023-08-30 11:53:42 -04:00
Michael Yang 982c535428
Merge pull request #428 from jmorganca/mxyng/upload-chunks
update upload chunks
2023-08-30 07:47:17 -07:00
Michael Yang 7df342a6ea
Merge pull request #421 from jmorganca/mxyng/f16-metal
allow F16 to use metal
2023-08-29 06:32:59 -07:00
Patrick Devine 8bbff2df98
add model IDs (#439) 2023-08-28 20:50:24 -07:00
Michael Yang 16b06699fd remove unused parameter 2023-08-28 18:35:18 -04:00
Michael Yang 246dc65417 loosen http status code checks 2023-08-28 18:34:53 -04:00
Michael Yang 865fceb73c chunked pipe 2023-08-28 18:34:53 -04:00
Michael Yang 72266c7684 bump chunk size to 95MB 2023-08-28 18:34:53 -04:00
Jeffrey Morgan d3b838ce60 update orca to orca-mini 2023-08-27 13:26:30 -04:00
Michael Yang e639a12fa1
Merge pull request #412 from jmorganca/mxyng/update-readme
update README.md
2023-08-26 21:26:34 -07:00
Michael Yang e82fcf30c6
Merge pull request #420 from jmorganca/mxyng/34b-mem-check
add 34b to mem check
2023-08-26 14:15:52 -07:00
Michael Yang 495e8b0a6a
Merge pull request #426 from jmorganca/default-template
set default template
2023-08-26 14:15:38 -07:00
Michael Yang 59734ca24d set default template 2023-08-26 12:20:48 -07:00
Jeffrey Morgan 22ab7f5f88 default host to 127.0.0.1, fixes #424 2023-08-26 11:59:28 -07:00
Michael Yang b25dd1795d allow F16 to use metal
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang 304f2b6c96 add 34b to mem check 2023-08-26 08:29:21 -07:00
Quinn Slack 2ecc3a33c3
delete all models (not just 1st) in ollama rm (#415)
Previously, `ollama rm model1 model2 modelN` would only delete `model1`. The other model command-line arguments would be silently ignored. Now, all models mentioned are deleted.
2023-08-26 00:47:56 -07:00
Jeffrey Morgan ee6e1df118 add codellama to model list in readme 2023-08-25 20:44:26 -07:00
Jeffrey Morgan 177b69a211 add missing entries for 34B 2023-08-25 18:35:35 -07:00
Michael Yang dad63f0821
Merge pull request #411 from jmorganca/mxyng/34b
patch llama.cpp for 34B
2023-08-25 11:59:05 -07:00
Michael Yang 041f9ad1a1 update README.md 2023-08-25 11:44:25 -07:00
Michael Yang 7a378f8b66 patch llama.cpp for 34B 2023-08-25 10:06:55 -07:00
Michael Yang de0bdd7f29
Merge pull request #405 from jmorganca/mxyng/34b
add 34b model type
2023-08-24 10:37:22 -07:00
Michael Yang b1cececb8e add 34b model type 2023-08-24 10:35:44 -07:00
Michael Yang e0d39fa3bf
Merge pull request #398 from jmorganca/mxyng/cleanup
Mxyng/cleanup
2023-08-22 15:51:41 -07:00
Michael Yang 968ced2e71
Merge pull request #393 from jmorganca/mxyng/net-url
use url.URL
2023-08-22 15:51:33 -07:00
Michael Yang 32d1a00017 remove unused requestContextKey 2023-08-22 10:49:54 -07:00
Michael Yang 04e2128273 move upload funcs to upload.go 2023-08-22 10:49:53 -07:00
Michael Yang 2cc634689b use url.URL 2023-08-22 10:49:07 -07:00
Michael Yang 8f827641b0
Merge pull request #397 from jmorganca/mxyng/release-mode
build release mode
2023-08-22 10:48:44 -07:00
Michael Yang 95187d7e1e build release mode 2023-08-22 09:52:43 -07:00
Michael Yang 9ec7e37534
Merge pull request #392 from jmorganca/mxyng/version
add version
2023-08-22 09:50:25 -07:00
Michael Yang 2c7f956b38 add version 2023-08-22 09:40:58 -07:00
Jeffrey Morgan a9f6c56652 fix FROM instruction erroring when referring to a file 2023-08-22 09:39:42 -07:00
Ryan Baker 0a892419ad
Strip protocol from model path (#377) 2023-08-21 21:56:56 -07:00
Jeffrey Morgan e3054fc74e add .env to .dockerignore 2023-08-21 09:32:02 -07:00
Michael Yang 23c2485044
Merge pull request #381 from jmorganca/mxyng/fix-push-chunks
retry on unauthorized chunk push
2023-08-18 13:49:25 -07:00
Michael Yang 386c66f285
Merge pull request #378 from jmorganca/mxyng/copy-metadata-from-source
copy metadata from source
2023-08-18 13:49:09 -07:00
Michael Yang 3b49315f97 retry on unauthorized chunk push
The token printed for authorized requests has a lifetime of 1h. If an
upload exceeds 1h, a chunk push will fail since the token is created on
a "start upload" request.

This replaces the Pipe with SectionReader which is simpler and
implements Seek, a requirement for makeRequestWithRetry. This is
slightly worse than using a Pipe since the progress update is directly
tied to the chunk size instead of controlled separately.
2023-08-18 11:23:47 -07:00
Michael Yang 5ca05c2e88 fix ModelType() 2023-08-18 11:23:38 -07:00
Michael Yang 7eda70f23b copy metadata from source 2023-08-17 21:55:25 -07:00
Jeffrey Morgan 3d79b414d3 app: package ggml-metal.metal from correct directory 2023-08-17 23:55:45 -04:00
Michael Yang c84bbf1dd6
Merge pull request #376 from jmorganca/mxyng/from-map-ignore-nil
ignore nil map values
2023-08-17 15:57:12 -07:00
Michael Yang f723bf0879 ignore nil map values 2023-08-17 15:50:46 -07:00
Michael Yang cbf725a9ba
Merge pull request #375 from jmorganca/mxyng/fix-push
fix push manifest
2023-08-17 15:33:31 -07:00