Commit graph

730 commits

Author SHA1 Message Date
Bryce Reitano 36a6daccab Restructure loading conditional chain 2024-04-24 17:37:03 -06:00
Bryce Reitano ceb0e26e5e Provide variable ggml for TestLoad 2024-04-24 17:19:55 -06:00
Bryce Reitano 284e02bed0 Move ggml loading to when we attempt fitting 2024-04-24 17:17:24 -06:00
Michael Yang 592dae31c8 update copy to use model.Name 2024-04-24 15:54:54 -07:00
Daniel Hiltgen d8851cb7a0 Harden sched TestLoad
Give the go routine a moment to deliver the expired event
2024-04-23 16:14:47 -07:00
Daniel Hiltgen 34b9db5afc Request and model concurrency
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Cheng 62be2050dd
chore: use errors.New to replace fmt.Errorf will much better (#3789) 2024-04-20 22:11:06 -04:00
Patrick Devine 9f8691c6c8
Add llama2 / torch models for ollama create (#3607) 2024-04-15 11:26:42 -07:00
Jeffrey Morgan a0b8a32eb4
Terminate subprocess if receiving SIGINT or SIGTERM signals while model is loading (#3653)
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Blake Mizerany a7b431e743
server: provide helpful workaround hint when stalling on pull (#3584)
This is a quick fix to help users who are stuck on the "pull" step at
99%.

In the near future we're introducing a new registry client that
should/will hopefully be smarter. In the meantime, this should unblock
the users hitting issue #1736.
2024-04-10 16:24:37 -07:00
Michael Yang 9502e5661f cgo quantize 2024-04-08 15:31:08 -07:00
Michael Yang e1c9a2a00f no blob create if already exists 2024-04-08 15:09:48 -07:00
Daniel Hiltgen 6589eb8a8c Revert options as a ref in the server 2024-04-02 16:44:10 -07:00
Daniel Hiltgen 58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine 3b6a9154dd
Simplify model conversion (#3422) 2024-04-01 16:14:53 -07:00
Michael Yang 91b3e4d282 update memory calcualtions
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang d338d70492 refactor model parsing 2024-04-01 13:16:15 -07:00
Patrick Devine 5a5efee46b
Add gemma safetensors conversion (#3250)
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Michael Yang af8a8a6b59 fix: trim quotes on OLLAMA_ORIGINS 2024-03-27 15:24:29 -07:00
Patrick Devine 1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Daniel Hiltgen 949b6c01e0 Revamp go based integration tests
This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.
2024-03-23 14:24:18 +01:00
Blake Mizerany 703684a82a
server: replace blob prefix separator from ':' to '-' (#3146)
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Patrick Devine 47cfe58af5
Default Keep Alive environment variable (#3094)
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
2024-03-13 13:29:40 -07:00
Daniel Hiltgen 4a5c9b8035 Finish unwinding idempotent payload logic
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan 5b3fad9636 separate out isLocalIP 2024-03-09 00:22:08 -08:00
Jeffrey Morgan bfec2c6e10 simplify host checks 2024-03-08 23:29:53 -08:00
Jeffrey Morgan 5c143af726 add additional allowed hosts 2024-03-08 23:23:59 -08:00
Jeffrey Morgan fc8c044584
add allowed host middleware and remove workDir middleware (#3018) 2024-03-08 22:23:47 -08:00
Michael Yang 76bdebbadf decode ggla 2024-03-08 15:46:25 -08:00
Bruce MacDonald 0cebc79cba
fix: allow importing a model from name reference (#3005) 2024-03-08 12:27:47 -05:00
Jeffrey Morgan fc06205971
Revert "adjust download and upload concurrency based on available bandwidth" (#2995) 2024-03-07 18:10:16 -08:00
Daniel Hiltgen 6c5ccb11f9 Revamp ROCm support
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed.  It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux.  Given the large size of ROCms tensor files, we split the
dependency out.  It's bundled into the installer on windows, and a
separate download on windows.  The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us.  For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
2024-03-07 10:36:50 -08:00
Michael Yang 2e20110e50
Merge pull request #2221 from ollama/mxyng/up-down-ccy
adjust download and upload concurrency based on available bandwidth
2024-03-07 09:27:33 -08:00
Patrick Devine 2c017ca441
Convert Safetensors to an Ollama model (#2824) 2024-03-06 21:01:51 -08:00
Jeffrey Morgan 3b4bab3dc5
Fix embeddings load model behavior (#2848) 2024-02-29 17:40:56 -08:00
Michael Yang 0e19476b56
prepend image tags (#2789)
instead of appending image tags, prepend them - this generally produces better results
2024-02-29 11:30:14 -08:00
Michael Yang 084d846621 refactor 2024-02-21 13:42:48 -08:00
Michael Yang 6a4b994433 lint 2024-02-21 13:42:48 -08:00
Michael Yang bea007deb7 use LimitGroup for uploads 2024-02-21 13:42:48 -08:00
Michael Yang 074934be03 adjust group limit based on download speed 2024-02-21 13:42:48 -08:00
Michael Yang 0de12368a0 add new LimitGroup for dynamic concurrency 2024-02-21 13:42:48 -08:00
Michael Yang 917bd61084 refactor download run 2024-02-21 13:42:46 -08:00
Jeffrey Morgan 287ba11500 better error message when calling /api/generate or /api/chat with embedding models 2024-02-20 21:53:45 -05:00
Jeffrey Morgan 63861f58cc
Support for bert and nomic-bert embedding models 2024-02-20 21:37:29 -05:00
Michael Yang 210b65268e
replace strings buffer with hasher (#2437)
the buffered value is going into the hasher eventually so write directly
to the hasher instead
2024-02-20 19:07:50 -05:00
Michael Yang 897b213468
use http.DefaultClient (#2530)
default client already handles proxy
2024-02-20 18:34:47 -05:00
Bruce MacDonald 88622847c6
fix: chat system prompting overrides (#2542) 2024-02-16 14:42:43 -05:00
Michael Yang e43648afe5 rerefactor 2024-02-15 05:56:45 +00:00
Daniel Hiltgen f397e0e988 Move hub auth out to new package 2024-02-15 05:56:45 +00:00
Jeffrey Morgan 48a273f80b
Fix issues with templating prompt in chat mode (#2460) 2024-02-12 15:06:57 -08:00
Jeffrey Morgan 1f9078d6ae
Check image filetype in api handlers (#2467) 2024-02-12 11:16:20 -08:00
Jeffrey Morgan a0a199b108
Fix hanging issue when sending empty content (#2399) 2024-02-07 19:30:33 -05:00
Jeffrey Morgan 453f572f83
Initial OpenAI /v1/chat/completions API compatibility (#2376) 2024-02-07 17:24:29 -05:00
Michael Yang e805ac1d59 fix response on token error 2024-02-07 11:05:49 -08:00
Michael Yang bfbf2f7cf7
Merge pull request #2296 from ollama/mxyng/img-tags
append image tags to user content
2024-02-01 13:16:59 -08:00
Michael Yang 3d6f48507a structured debug prompt 2024-02-01 11:56:28 -08:00
Michael Yang f3761405c8 use image id 2024-02-01 11:52:42 -08:00
Michael Yang e49dc9f3d8 fix tests 2024-02-01 11:48:11 -08:00
Michael Yang d125510b4b remove image tags 2024-02-01 11:32:51 -08:00
Michael Yang fb56988014 account for image projection in token count 2024-02-01 09:50:48 -08:00
Michael Yang d046bee790 use llm.ImageData for chat 2024-01-31 19:18:25 -08:00
Jeffrey Morgan f11bf0740b use llm.ImageData 2024-01-31 19:13:48 -08:00
Michael Yang 8450bf66e6 trim images 2024-01-31 19:13:47 -08:00
Michael Yang b4e11be8ef append image tags to user content 2024-01-31 19:13:10 -08:00
Bruce MacDonald a896079705
preserve last system message from modelfile (#2289) 2024-01-31 21:45:01 -05:00
Michael Yang 8ac08a0eec update slog handler options
- consistent format by using text handler for debug and non-debug
- truncate source file to just the file name
2024-01-31 15:15:00 -08:00
Michael Yang c8b1f2369e remove unnecessary parse raw 2024-01-30 17:00:53 -08:00
Bruce MacDonald 0632dff3f8
trim chat prompt based on llm context size (#1963) 2024-01-30 15:59:29 -05:00
Jeffrey Morgan f2245c7c77
print prompt with OLLAMA_DEBUG=1 (#2245) 2024-01-28 15:22:35 -08:00
Jeffrey Morgan e4b9b72f2a
Do not repeat system prompt for chat templating (#2241) 2024-01-28 14:15:56 -08:00
Patrick Devine b5cf31b460
add keep_alive to generate/chat/embedding api endpoints (#2146) 2024-01-26 14:28:02 -08:00
Michael Yang 9d3dcfd0ec fix logging 2024-01-26 11:04:27 -08:00
Michael Yang 6e0ea5ecc8
Merge pull request #1916 from ollama/mxyng/inactivity-monitor
download: add inactivity monitor
2024-01-26 10:56:00 -08:00
Patrick Devine 7c40a67841
Save and load sessions (#2063) 2024-01-25 12:12:36 -08:00
Michael Yang c08dfaa23d fix: remove overwritten model layers
if create overrides a manifest, first add the older manifest's layers to
the delete map so they can be cleaned up
2024-01-19 14:58:37 -08:00
Michael Yang aac9ab4db7 fix show handler 2024-01-18 15:36:50 -08:00
Michael Yang 745b5934fa add model to ModelResponse 2024-01-18 14:32:55 -08:00
Michael Yang a38d88d828 api: add model for all requests
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Daniel Hiltgen fedd705aea Mechanical switch from log to slog
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Michael Yang 96cfb62641 fix: normalize name path before splitting 2024-01-16 16:48:29 -08:00
Patrick Devine eef50accb4
Fix show parameters (#2017) 2024-01-16 10:34:44 -08:00
Michael Yang 27331ae3a8 download: add inactivity monitor
if a download part is inactive for some time, restart it
2024-01-12 15:23:15 -08:00
Michael Yang cf29bd2d72 fix: request retry with error
this fixes a subtle bug with makeRequestWithRetry where an HTTP status
error on a retried request will potentially not return the right err
2024-01-12 13:32:27 -08:00
Michael Yang 2b9892a808 fix(windows): modelpath and list 2024-01-09 09:36:58 -08:00
Michael Yang 2bb2bdd5d4 fix lint 2024-01-09 09:36:58 -08:00
Michael Yang acfc376efd add .golangci.yaml 2024-01-09 09:36:58 -08:00
Bruce MacDonald 7e8f7c8358
remove ggml automatic re-pull (#1856) 2024-01-08 14:41:01 -05:00
Michael Yang 0101e76dbe
Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05
fix: allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 17:20:09 -08:00
Patrick Devine 22e93efa41 add show info command and fix the modelfile 2024-01-05 12:20:05 -08:00
Nicholas Dudfield 8baaaa39c0 Allow extension origins (still needs explicit listing), fixes #1686 2024-01-05 09:06:47 +07:00
Bruce MacDonald 4ad6c9b11f
fix: pull either original model or from model on create (#1774) 2024-01-04 01:34:38 -05:00
Bruce MacDonald 0b3118e0af
fix: relay request opts to loaded llm prediction (#1761) 2024-01-03 12:01:42 -05:00
Daniel Hiltgen 697bea6939 Guard integration tests with a tag
This should help CI avoid running the integration test logic in a
container where it's not currently possible.
2023-12-22 16:33:27 -08:00
Bruce MacDonald db356c8519
post-response templating (#1427) 2023-12-22 17:07:05 -05:00
Daniel Hiltgen 96fb441abd
Merge pull request #1146 from dhiltgen/ext_server_cgo
Add cgo implementation for llama.cpp
2023-12-22 08:16:31 -08:00
Michael Yang 63aac0edc5 fix(test): use real version string for comparison 2023-12-19 15:03:02 -08:00
Daniel Hiltgen 51082535e1 Add automated test for multimodal
A simple test case that verifies llava:7b can read text in an image
2023-12-19 09:05:46 -08:00
Daniel Hiltgen 35934b2e05 Adapted rocm support to cgo based llama.cpp 2023-12-19 09:05:46 -08:00
Daniel Hiltgen d4cd695759 Add cgo implementation for llama.cpp
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Bruce MacDonald 5e7fd6906f Update images.go 2023-12-19 09:05:46 -08:00
Bruce MacDonald 811b1f03c8 deprecate ggml
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-19 09:05:46 -08:00
Bruce MacDonald d99fa6ce0a
send empty messages on last chat response (#1530) 2023-12-18 14:23:38 -05:00
Patrick Devine 3948c6ea06
add magic header for unit tests (#1558) 2023-12-18 10:41:02 -08:00
Patrick Devine 86b0dd4b16
add API create/copy handlers (#1541) 2023-12-15 11:59:18 -08:00
Patrick Devine 0174665d0e
add API tests for list handler (#1535) 2023-12-14 18:18:25 -08:00
Patrick Devine 630518f0d9
Add unit test of API routes (#1528) 2023-12-14 16:47:40 -08:00
Bruce MacDonald 6ee8c80199
restore model load duration on generate response (#1524)
* restore model load duration on generate response

- set model load duration on generate and chat done response
- calculate createAt time when response created

* remove checkpoints predict opts

* Update routes.go
2023-12-14 12:15:50 -05:00
Jeffrey Morgan 4a1abfe4fa fix tests 2023-12-13 14:42:30 -05:00
Patrick Devine d9e60f634b
add image support to the chat api (#1490) 2023-12-12 13:28:58 -08:00
Jeffrey Morgan 0a9d348023
Fix issues with /set template and /set system (#1486) 2023-12-12 14:43:19 -05:00
Patrick Devine 910e9401d0
Multimodal support (#1216)
---------

Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
2023-12-11 13:56:22 -08:00
Jeffrey Morgan 7db5bcf73b fix go-staticcheck warning 2023-12-10 11:44:27 -05:00
Jeffrey Morgan fa2f095bd9 fix model name returned by /api/generate being different than the model name provided 2023-12-10 11:42:15 -05:00
Jeffrey Morgan 045b855db9 fix error on accumulating final chat response 2023-12-10 11:24:39 -05:00
Jeffrey Morgan 32064a0646 fix empty response when receiving runner error 2023-12-10 10:53:38 -05:00
Jeffrey Morgan 9e1406e4ed Don't expose model information in /api/generate 2023-12-09 02:05:43 -08:00
Bruce MacDonald 7e9405fd07
fix: encode full previous prompt in context (#1424) 2023-12-08 16:53:51 -05:00
Bruce MacDonald 3b0b8930d4
fix: only flush template in chat when current role encountered (#1426) 2023-12-08 16:44:24 -05:00
Bruce MacDonald e3f925fc1b
fix: restore modelfile system in prompt template (#1425) 2023-12-08 14:20:19 -05:00
Michael Yang 1f05d77110
Merge pull request #1244 from jmorganca/brucemacd/no-fail-template
do not fail on unsupported template variables
2023-12-06 13:23:04 -08:00
Michael Yang c3ff36088b
Merge pull request #774 from jmorganca/mxyng/server-version
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Bruce MacDonald 47d4e22673 use missingkey in set empty interface when missing 2023-12-05 15:49:05 -08:00
Michael Yang 5d75505ebd return model configuration in generate 2023-12-05 14:39:02 -08:00
Michael Yang b9495ea162 load projectors 2023-12-05 14:36:12 -08:00
Michael Yang 409bb9674e
Merge pull request #1308 from jmorganca/mxyng/split-from
split from into one or more models
2023-12-05 14:33:03 -08:00
Michael Yang d3479c07a1
Merge pull request #1250 from jmorganca/mxyng/create-layer
refactor layer creation
2023-12-05 14:32:52 -08:00
Bruce MacDonald 195e3d9dbd
chat api endpoint (#1392) 2023-12-05 14:57:33 -05:00
Michael Yang 1ebdbd9694 server: add version handler 2023-12-05 09:36:01 -08:00
Jeffrey Morgan 00d06619a1 Revert "chat api (#991)" while context variable is fixed
This reverts commit 7a0899d62d.
2023-12-04 21:16:27 -08:00
Michael Yang a3737cbd33 use NewLayer for CreateBlobHandler 2023-12-04 16:59:23 -08:00
Michael Yang 998f1785b6 add modelfamilies 2023-12-04 16:59:23 -08:00
Michael Yang 70a93057cd refactor layer creation
previous layer creation was not ideal because:

1. it required reading the input file multiple times, once to calculate
   the sha256 checksum, another to write it to disk, and potentially one
   more to decode the underlying gguf
2. used io.ReadSeeker which is prone to user error. if the file isn't
   reset correctly or in the right place, it could end up reading an
   empty file

there are also some brittleness when reading existing layers else
writing the inherited layers will error reading an already closed file

this commit aims to fix these issues by restructuring layer creation.

1. it will now write the layer to a temporary file as well as the hash
   function and move it to the final location on Commit
2. layers are read once once when copied to the destination. exception
   is raw model files which still requires a second read to decode the
   model metadata
2023-12-04 16:59:23 -08:00
Michael Yang 2cb0fa7d40 split from into one or more models 2023-12-04 16:59:23 -08:00
Bruce MacDonald 7a0899d62d
chat api (#991)
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Joshua Pham bb80a597db Fix adapter loading from SHA hash 2023-12-01 13:50:55 -05:00
Michael Yang 13efd5f218 upload: fix PUT retry 2023-11-29 16:38:35 -08:00
Michael Yang c4bdfffd96 upload: separate progress tracking 2023-11-29 16:38:33 -08:00
Michael Yang 26c63418e0 new hasher 2023-11-29 14:52:41 -08:00
Michael Yang 2799784ac8 revert checksum calculation to calculate-as-you-go 2023-11-29 13:47:58 -08:00
Bruce MacDonald 96122b7271
validate model tags on copy (#1323) 2023-11-29 15:54:29 -05:00
Timothy Jaeryang Baek c2e3b89176
fix: disable ':' in tag names (#1280)
Co-authored-by: rootedbox
2023-11-29 13:33:45 -05:00
Patrick Devine cde31cb220
Allow setting parameters in the REPL (#1294) 2023-11-29 09:56:42 -08:00
Bruce MacDonald 37d95157df
fix relative path on create (#1222) 2023-11-21 15:43:17 -05:00
Jeffrey Morgan 35c4b5ec16 calculate hash separately from http request 2023-11-20 15:45:11 -05:00
Jeffrey Morgan 9d73d3a6b5 add back part.Reset() 2023-11-19 14:32:19 -05:00
Jeffrey Morgan 72cd336410 dont retry on upload complete context cancel 2023-11-19 14:32:19 -05:00
Jeffrey Morgan 1bd594b2fa revert to using one open file for blob uploads 2023-11-19 14:32:19 -05:00
Jeffrey Morgan 9a8c21ac3d use exponential everywhere 2023-11-19 14:32:19 -05:00
Jeffrey Morgan f6b317e8c9 fix sending too little data in chunk upload body 2023-11-19 14:32:19 -05:00
Jeffrey Morgan ac5076ce1e exponential backoff up to 30s 2023-11-19 14:32:19 -05:00
Michael Yang 42c2e3a624 upload: retry complete upload 2023-11-19 14:32:19 -05:00
Michael Yang cb42589792 adjust download/upload parts 2023-11-19 14:32:19 -05:00
Jeffrey Morgan e1d7056496 update progress statuses 2023-11-19 09:21:13 -05:00
Jeffrey Morgan 02524a56ff check retry for authorization error 2023-11-19 00:19:53 -05:00
Jeffrey Morgan 12e046f12a remove unused function 2023-11-18 22:16:51 -05:00
Bruce MacDonald 43a726149d fix potentially inaccurate error message 2023-11-18 21:25:07 -05:00
Jeffrey Morgan bab9494176 add - separator to temp file created on ollama create 2023-11-18 09:39:52 -05:00
Michael Yang c6e6c8ee7e fix cross device rename 2023-11-17 15:22:17 -08:00
Michael Yang c1bbf5ddee
Merge pull request #1134 from jmorganca/mxyng/progress
progress bar
2023-11-17 14:03:35 -08:00
Bruce MacDonald 0b19e24d81
only retry once on auth failure (#1175) 2023-11-17 14:22:35 -05:00
Michael Yang d6ecaa2cbf update progress responses 2023-11-17 10:06:19 -08:00
Bruce MacDonald 4b3f4bc7d9
return failure details when unauthorized to push (#1131)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-16 16:44:18 -05:00
Michael Yang a5ccf742c1 fix cross repo mounts 2023-11-16 16:33:30 -05:00
Michael Yang e33ef391cd fix push scope error for inherited model 2023-11-16 16:33:30 -05:00
Michael Yang 54f92f01cb update docs 2023-11-15 15:28:15 -08:00
Michael Yang 652d90e1c7 Update server/images.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-15 15:16:23 -08:00
Michael Yang bc22d5a38b no blob response 2023-11-15 15:16:23 -08:00
Michael Yang 1901044b07 use checksum reference 2023-11-15 15:16:23 -08:00
Michael Yang a07c935d34 ignore non blobs 2023-11-15 15:16:23 -08:00
Michael Yang 1552cee59f client create modelfile 2023-11-15 15:16:23 -08:00
Michael Yang 3ca56b5ada add create modelfile field 2023-11-15 15:16:23 -08:00
Michael Yang b0d14ed51c refactor create model 2023-11-15 15:16:23 -08:00
Michael Yang d91c103e74
Merge pull request #1055 from dansreis/946-fix-incorrect-base-model-name
Fixed incorrect base model name
2023-11-13 08:42:55 -08:00
Daniel Reis 7c438f2c53 Replaced method 2023-11-10 20:22:03 +00:00
Daniel Reis 6e46338d44 Reverting previous changes 2023-11-10 20:21:35 +00:00
Daniel Hiltgen cc54a416c6 Resume chunk download on UnexpectedEOF errors
If the chunk download is interrupted, resume from where we left off
2023-11-10 08:29:42 -08:00
Jeffrey Morgan 5cba29b9d6
JSON mode: add `"format" as an api parameter (#1051)
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Daniel Reis d17730356a Removed inline parse model path 2023-11-09 22:44:26 +00:00
Daniel Reis 32d79a6eea Using 'GetShortTagname' method instead 2023-11-09 22:40:37 +00:00
Bruce MacDonald ec2a31e9b3
support raw generation requests (#952)
- add the optional `raw` generate request parameter to bypass prompt formatting and response context
-add raw request to docs
2023-11-08 14:05:02 -08:00
Michael Yang 146072113d
Merge pull request #993 from jmorganca/mxyng/cleanup
cleanup upload and download errors
2023-11-06 11:32:12 -08:00
Jeffrey Morgan e21579a0f1 Restore system prompt on requests 2023-11-03 17:26:45 -07:00
Michael Yang 434a6f9d46 return last error 2023-11-03 16:49:51 -07:00
Michael Yang 84725ec7e3 refactor part reset 2023-11-03 09:20:32 -07:00
Noah Gitsham 8ae8c9fa8c
Remove duplicate "install" in GPU support warning (#984) 2023-11-03 00:45:14 -07:00
Noah Gitsham f39daff461
Add missing "be" to GPU support warning message (#983) 2023-11-02 18:37:12 -07:00
Jeffrey Morgan c50b01bc21 check request.Context for initial system prompt 2023-11-02 18:17:00 -07:00
Bruce MacDonald b9dc875401
remove modelfile context deprecated in v0.0.7 (#974) 2023-11-02 20:52:56 -04:00
Michael Yang 1fd511e661
Merge pull request #975 from jmorganca/mxyng/downloads
update downloads to use retry wrapper
2023-11-02 16:12:48 -07:00
Jeffrey Morgan 1beb5645a9
only use system prompt if context is not provided (#978) 2023-11-02 15:48:02 -07:00
Michael Yang fe5a872444 fix upload 2023-11-02 13:25:58 -07:00
Michael Yang d39709260f download with retry 2023-11-02 13:16:11 -07:00
Michael Yang 60bb3c03a1 use http.Method 2023-11-02 13:12:45 -07:00
Michael Yang c4cc738cbf fix log 2023-11-01 17:18:11 -07:00
Michael Yang 2c6189f4fe
Merge pull request #750 from jmorganca/mxyng/concurrent-uploads
concurrent uploads
2023-11-01 15:00:01 -07:00
Bruce MacDonald f9a4281124
clean up: remove server functions from client (#937) 2023-10-30 11:10:18 -04:00
Michael Yang 115fc56eb7 calculate and verify md5 checksum 2023-10-27 17:07:33 -07:00
Michael Yang 186f685224 retry PUT 2023-10-27 17:07:33 -07:00
Michael Yang 12efcbb057 comments 2023-10-27 17:07:33 -07:00
Michael Yang 4e09aab8b9 concurrent uploads 2023-10-27 17:07:33 -07:00
Bruce MacDonald 5c3491f425
allow for a configurable ollama model storage directory (#897)
* allow for a configurable ollama models directory

- set OLLAMA_MODELS in the environment that ollama is running in to change where model files are stored
- update docs

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
Co-Authored-By: Jay Nakrani <dhananjaynakrani@gmail.com>
Co-Authored-By: Akhil Acharya <akhilcacharya@gmail.com>
Co-Authored-By: Sasha Devol <sasha.devol@protonmail.com>
2023-10-27 10:19:59 -04:00
Michael Yang 910816a532 fix(download): no retry when out of space 2023-10-26 11:34:07 -07:00
Michael Yang 386169205c
update runtime options (#864) 2023-10-20 21:17:14 -04:00
Michael Yang 75bee074b6 fix: nil pointer dereference 2023-10-20 16:55:24 -07:00
Jeffrey Morgan 7ed5a39bc7 simpler check for model loading compatibility errors 2023-10-19 14:50:49 -04:00
Michael Yang 846f593dbf
Merge pull request #828 from jmorganca/mxyng/template-parameters
image: show parameters
2023-10-19 09:31:31 -07:00
Michael Yang e1c5be24e7 check json eof 2023-10-19 09:21:51 -07:00
Michael Yang 2ad8a074ac generate: set created_at
move the empty response so it's more visible
2023-10-19 09:21:51 -07:00
Michael Yang 7e547c6833 s/message/error/ 2023-10-19 09:21:04 -07:00
Michael Yang 689842b9ff request: bad request when model missing fields 2023-10-19 09:21:04 -07:00
Michael Yang a19d47642e models: rm workDir from CreateModel
unused after removing EMBED
2023-10-19 09:21:04 -07:00
Bruce MacDonald fe6f3b48f7
do not reload the running llm when runtime params change (#840)
- only reload the running llm if the model has changed, or the options for loading the running model have changed
- rename loaded llm to runner to differentiate from loaded model image
- remove logic which keeps the first system prompt in the generation context
2023-10-19 10:39:58 -04:00
Michael Yang 4dcceeffb7 let the template do the work 2023-10-18 13:12:00 -07:00
Michael Yang 019e4a4558 image: show parameters 2023-10-18 13:12:00 -07:00
Michael Yang 627d04d927
Merge pull request #827 from jmorganca/mxyng/template-adapters
model: native gotemplate adapter template
2023-10-18 13:11:25 -07:00
Michael Yang 940e8ebec3
Merge pull request #826 from jmorganca/mxyng/template-system
show: no template system if empty
2023-10-18 13:11:09 -07:00
Yiorgis Gozadinos 8c6c2cbc8c When the .ollama folder is broken or there are no models return an empty list on /api/tags 2023-10-18 08:23:20 +02:00
Michael Yang 8299bf76ed model: native gotemplate adapter template 2023-10-17 15:28:38 -07:00
Michael Yang ee4979e510 show: no template system if empty 2023-10-17 15:25:43 -07:00
Michael Yang 1af493c5a0 server: print version on start 2023-10-16 09:59:14 -07:00
Bruce MacDonald a0c3e989de
deprecate modelfile embed command (#759) 2023-10-16 11:07:37 -04:00
Michael Yang 7a537cdca9
Merge pull request #770 from jmorganca/mxyng/fix-download
fix download
2023-10-12 12:56:43 -07:00
Michael Yang 257ffeb997 fix download 2023-10-12 12:52:43 -07:00
Bruce MacDonald 7804b8fab9
validate api options fields from map (#711) 2023-10-12 11:18:11 -04:00
Michael Yang c413a55093 download: handle inner errors 2023-10-11 14:15:30 -07:00
Michael Yang 630bb75d2a dynamically size download parts based on file size 2023-10-11 14:10:25 -07:00
Michael Yang a2055a1e93 update download 2023-10-11 14:10:25 -07:00
Bruce MacDonald 274d5a5fdf
optional parameter to not stream response (#639)
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Jeffrey Morgan 65dcd0ce35
always cleanup blob download (#747) 2023-10-10 13:12:29 -04:00
Michael Yang f6e98334e4 handle upstream proxies 2023-10-09 11:42:36 -07:00
Bruce MacDonald af4cf55884
not found error before pulling model (#718) 2023-10-06 16:06:20 -04:00
Bruce MacDonald d6786f2945
add feedback for reading model metadata (#722) 2023-10-06 16:05:32 -04:00
Michael Yang 0560b28a8d names 2023-10-06 12:56:56 -07:00
Michael Yang 10199c5987 replace done channel with file check 2023-10-06 12:56:56 -07:00
Michael Yang 288814d3e4 fix ref counts 2023-10-06 12:56:43 -07:00
Michael Yang 04733438da check head request response 2023-10-06 12:56:43 -07:00
Michael Yang 711e891f0f fix resumable downloads
glob returns files in lexical order which is not appropriate when
rebuilding the parts list
2023-10-06 12:56:43 -07:00
Michael Yang 090d08422b handle unexpected eofs 2023-10-06 12:56:43 -07:00
Michael Yang 5b84404c64 handle concurrent requests for the same blobs 2023-10-06 12:56:43 -07:00
Michael Yang 8544edca21 parallel chunked downloads 2023-10-06 12:56:43 -07:00
Bruce MacDonald 2130c0708b
output type parsed from modelfile (#678) 2023-10-05 14:58:04 -04:00
Jay Nakrani 1d0ebe67e8
Document response stream chunk delimiter. (#632)
Document response stream chunk delimiter.
2023-09-29 21:45:52 -07:00
Bruce MacDonald a1b2d95f96
remove unused push/pull params (#650) 2023-09-29 17:27:19 -04:00
Michael Yang 9333b0cc82
Merge pull request #612 from jmorganca/mxyng/prune-empty-directories
prune empty directories
2023-09-29 11:23:39 -07:00
Michael Yang f40b3de758 use int64 consistently 2023-09-28 11:07:24 -07:00
Michael Yang 8608eb4760 prune empty directories 2023-09-27 10:58:09 -07:00
Jeffrey Morgan 9b12a511ca check other request fields before load short circuit in /api/generate 2023-09-22 23:50:55 -04:00
Bruce MacDonald 5d71bda478
close llm on interrupt (#577) 2023-09-22 19:41:52 +01:00
Michael Yang 82f5b66c01 register HEAD /api/tags 2023-09-21 16:38:03 -07:00
Michael Yang c986694367 fix HEAD / request
HEAD request should respond like their GET counterparts except without a
response body.
2023-09-21 16:35:58 -07:00
Bruce MacDonald 4cba75efc5
remove tmp directories created by previous servers (#559)
* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Michael Yang 1fabba474b refactor default allow origins
this should be less error prone
2023-09-21 09:42:25 -07:00
Michael Yang ee4fd16f2c
Merge pull request #556 from jmorganca/pack-cuda
pack in cuda libs
2023-09-20 15:02:36 -07:00
Bruce MacDonald 1255bc9b45 only package 11.8 runner 2023-09-20 20:00:41 +01:00
Michael Yang 499e9007a5 pick chunksize based on location 2023-09-20 11:10:24 -07:00
Michael Yang aa45d7c1df draft: explicitly follow upload redirects 2023-09-19 13:36:58 -07:00
Michael Yang a5520bfb42 fix build 2023-09-19 10:42:24 -07:00
Michael Yang b58d5d16b0 fix mkdir on windows 2023-09-19 09:41:13 -07:00
Patrick Devine 24580df958
only add a layer if there is actual data (#535) 2023-09-18 13:47:45 -07:00
Patrick Devine 80dd44e80a
Cmd changes (#541) 2023-09-18 12:26:56 -07:00
Michael Yang 08d7c2a944 fix error on upload chunk 2023-09-15 15:59:30 -07:00
Michael Yang e53bc57d4d split uploadBlobChunked 2023-09-14 17:22:05 -07:00
Michael Yang f0b398d17f implement ProgressWriter 2023-09-14 17:22:04 -07:00
Michael Yang daa4f096f9 set request.ContentLength
This informs the HTTP client the content length is known and disables
chunked Transfer-Encoding
2023-09-14 13:32:44 -07:00
Michael Yang e6881cabd0 remove unused 2023-09-13 14:48:33 -07:00
Michael Yang 0c5a454361 fix model type for 70b 2023-09-12 15:12:59 -07:00
Michael Yang 7dee25a07f fix falcon decode
get model and file type from bin file
2023-09-12 12:34:53 -07:00
Bruce MacDonald f221637053
first pass at linux gpu support (#454)
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-12 11:04:35 -04:00
Patrick Devine 45ac07cd02
create the blobs directory correctly (#508) 2023-09-11 14:54:52 -07:00
Patrick Devine e7e91cd71c
add autoprune to remove unused layers (#491) 2023-09-11 11:46:35 -07:00
Jeffrey Morgan 3920e15386
add model format to config layer (#497) 2023-09-09 17:53:44 -04:00
Michael Yang de227b620f fix nil pointer dereference 2023-09-07 17:24:31 -07:00
Michael Yang 738fe9c4aa
Merge pull request #486 from jmorganca/mxyng/fix-push
fix: retry push on expired token
2023-09-07 13:58:34 -07:00
Michael Yang bf146fb072 fix retry on unauthorized chunk 2023-09-07 12:02:04 -07:00
Michael Yang f0f4943577 fix get auth token 2023-09-07 12:01:56 -07:00
Bruce MacDonald 09dd2aeff9
GGUF support (#441) 2023-09-07 13:55:37 -04:00
Michael Yang 83c6be1666
fix model manifests (#477) 2023-09-06 17:30:08 -04:00
Patrick Devine 790d24eb7b
add show command (#474) 2023-09-06 11:04:17 -07:00
Michael Yang a1ecdd36d5 create manifests directory 2023-09-05 17:10:40 -07:00
Michael Yang d1c2558f7e
Merge pull request #461 from jmorganca/mxyng/fix-inherit-params
fix inherit params
2023-09-05 12:30:23 -07:00
Michael Yang 06ef90c051 fix parameter inheritence
parameters are not inherited because they are processed differently from
other layer. fix this by explicitly merging the inherited params into
the new params. parameter values defined in the new modelfile will
override those defined in the inherited modelfile. array lists are
replaced instead of appended
2023-09-05 11:40:20 -07:00
Michael Yang e9f6df7dca use slices.DeleteFunc 2023-09-05 09:56:59 -07:00
Michael Yang 681f3c4c42 fix num_keep 2023-09-03 17:47:49 -04:00
Quinn Slack 62d29b2157 do not HTML-escape prompt
The `html/template` package automatically HTML-escapes interpolated strings in templates. This behavior is undesirable because it causes prompts like `<h1>hello` to be escaped to `&lt;h1&gt;hello` before being passed to the LLM.

The included test case passes, but before the code change, it failed:

```
--- FAIL: TestModelPrompt
    images_test.go:21: got "a&lt;h1&gt;b", want "a<h1>b"
```
2023-09-01 17:16:38 -05:00
Michael Yang 1c8fd627ad windows: fix create modelfile 2023-08-31 09:47:10 -04:00
Michael Yang ae950b00f1 windows: fix delete 2023-08-31 09:47:10 -04:00
Michael Yang eeb40a672c fix list models for windows 2023-08-31 09:47:10 -04:00
Michael Yang 0f541a0367 s/ListResponseModel/ModelResponse/ 2023-08-31 09:47:10 -04:00
Bruce MacDonald 42998d797d
subprocess llama.cpp server (#401)
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Quinn Slack f4432e1dba
treat stop as stop sequences, not exact tokens (#442)
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.
2023-08-30 11:53:42 -04:00
Michael Yang 982c535428
Merge pull request #428 from jmorganca/mxyng/upload-chunks
update upload chunks
2023-08-30 07:47:17 -07:00
Patrick Devine 8bbff2df98
add model IDs (#439) 2023-08-28 20:50:24 -07:00
Michael Yang 16b06699fd remove unused parameter 2023-08-28 18:35:18 -04:00
Michael Yang 246dc65417 loosen http status code checks 2023-08-28 18:34:53 -04:00
Michael Yang 865fceb73c chunked pipe 2023-08-28 18:34:53 -04:00
Michael Yang 72266c7684 bump chunk size to 95MB 2023-08-28 18:34:53 -04:00
Michael Yang 59734ca24d set default template 2023-08-26 12:20:48 -07:00
Michael Yang 32d1a00017 remove unused requestContextKey 2023-08-22 10:49:54 -07:00
Michael Yang 04e2128273 move upload funcs to upload.go 2023-08-22 10:49:53 -07:00
Michael Yang 2cc634689b use url.URL 2023-08-22 10:49:07 -07:00