Commit graph

3090 commits

Author SHA1 Message Date
Daniel Hiltgen 73e2c8f68f Fix context exhaustion integration test for small gpus
On the smaller GPUs, the initial model load of llama2 took over 30s (the
default timeout for the DoGenerate helper)
2024-07-09 16:24:14 -07:00
Daniel Hiltgen 2d1e3c3229
Merge pull request #5503 from dhiltgen/dual_rocm
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
royjhan 4918fae535
OpenAI v1/completions: allow stop token list (#5551)
* stop token parsing fix

* add stop test
2024-07-09 14:01:26 -07:00
royjhan 0aff67877e
separate request tests (#5578) 2024-07-09 13:48:31 -07:00
Daniel Hiltgen 9544a57ee4
Merge pull request #5579 from dhiltgen/win_static_deps
Statically link c++ and thread lib on windows
2024-07-09 12:21:13 -07:00
Daniel Hiltgen b51e3b63ac Statically link c++ and thread lib
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang 6bbbc50f10
Merge pull request #5440 from ollama/mxyng/messages-templates
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang 9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan e4ff73297d
server: fix model reloads when setting OLLAMA_NUM_PARALLEL (#5560)
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`

* remove whitespace change

* undo some changes
2024-07-08 22:32:15 -07:00
Daniel Hiltgen 0bacb30007 Workaround broken ROCm p2p copy
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan 53da2c6965
llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation (#5535) 2024-07-07 14:32:05 -04:00
Jeffrey Morgan d8def1ff94
llm: allow gemma 2 to context shift (#5534) 2024-07-07 13:41:51 -04:00
Jeffrey Morgan 571dc61955
Update llama.cpp submodule to a8db2a9c (#5530) 2024-07-07 13:03:09 -04:00
Jeffrey Morgan 0e09c380fc
llm: print caching notices in debug only (#5533) 2024-07-07 12:38:04 -04:00
Jeffrey Morgan 0ee87615c7
sched: don't error if paging to disk on Windows and macOS (#5523) 2024-07-06 22:01:52 -04:00
Jeffrey Morgan f8241bfba3
gpu: report system free memory instead of 0 (#5521) 2024-07-06 19:35:04 -04:00
Jeffrey Morgan 4607c70641
llm: add -DBUILD_SHARED_LIBS=off to common cpu cmake flags (#5520) 2024-07-06 18:58:16 -04:00
jmorganca c12f1c5b99 release: move mingw library cleanup to correct job 2024-07-06 16:12:29 -04:00
jmorganca a08f20d910 release: remove unwanted mingw dll.a files 2024-07-06 15:21:15 -04:00
jmorganca 6cea036027 Revert "llm: only statically link libstdc++"
This reverts commit 5796bfc401.
2024-07-06 15:10:48 -04:00
jmorganca 5796bfc401 llm: only statically link libstdc++ 2024-07-06 14:06:20 -04:00
jmorganca f1a379aa56 llm: statically link pthread and stdc++ dependencies in windows build 2024-07-06 12:54:02 -04:00
jmorganca 9ae146993e llm: add GGML_STATIC flag to windows static lib 2024-07-06 03:27:05 -04:00
Jeffrey Morgan e0348d3fe8
llm: add COMMON_DARWIN_DEFS to arm static build (#5513) 2024-07-05 22:42:42 -04:00
Jeffrey Morgan 2cc854f8cb
llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511)
* Revert "fix cmake build (#5505)"

This reverts commit 4fd5f3526a.

* llm: fix missing dylibs by restoring old build behavior

* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan 5304b765b2
llm: put back old include dir (#5507)
* llm: put back old include dir

* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
Michael Yang fb6cbc02fb update named templates 2024-07-05 16:29:32 -07:00
Jeffrey Morgan 4fd5f3526a
fix cmake build (#5505) 2024-07-05 19:07:01 -04:00
Daniel Hiltgen 842f85f758
Merge pull request #5502 from dhiltgen/ci_fixes
Always go build in CI generate steps
2024-07-05 15:39:11 -07:00
Daniel Hiltgen 9d30f9f8b3 Always go build in CI generate steps
With the recent cgo changes, bugs can sneak through
if we don't make sure to `go build` all the permutations
2024-07-05 15:31:52 -07:00
Blake Mizerany 631cfd9e62
types/model: remove knowledge of digest (#5500)
This was leading to ambiguity and confusion in ollama.com, and is not
used anywhere in ollama at the moment. Once manifests are addressable by
digest, we can add this back in, and in a way that is more tailored to
the concept of addressing a manifest by digest.
2024-07-05 13:42:30 -07:00
Michael Yang 326363b3a7 no funcs 2024-07-05 13:17:25 -07:00
Michael Yang ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang 2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang 269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Jeffrey Morgan 78fb33dd07
fix typo in cgo directives in llm.go (#5501) 2024-07-05 15:18:36 -04:00
Jeffrey Morgan 8f8e736b13
update llama.cpp submodule to d7fd29f (#5475) 2024-07-05 13:25:58 -04:00
Jeffrey Morgan d89454de80
Use slot with cached prompt instead of least recently used (#5492)
* Use common prefix to select slot

* actually report `longest`
2024-07-05 12:32:47 -04:00
Daniel Hiltgen af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00
Jeffrey Morgan e9188e971a
Fix assert on small embedding inputs (#5491)
* Fix assert on small embedding inputs

* Update llm/patches/09-pooling.diff
2024-07-05 11:20:57 -04:00
Daniel Hiltgen 78eddfc068
Merge pull request #4412 from dhiltgen/win_docs
Document older win10 terminal problems
2024-07-05 08:18:22 -07:00
Daniel Hiltgen 02c24d3d01
Merge pull request #5466 from dhiltgen/fix_clip_unicode
Fix clip model loading with unicode paths
2024-07-05 08:16:58 -07:00
Daniel Hiltgen 52abc8acb7 Document older win10 terminal problems
We haven't found a workaround, so for now recommend updating.
2024-07-03 17:32:14 -07:00
Jeffrey Morgan 4d71c559b2
fix error detection by limiting model loading error parsing (#5472) 2024-07-03 20:04:30 -04:00
Anatoli Babenia 0d16eb310e
fix: use envconfig.ModelsDir directly (#4821)
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>

Co-authored-by: Maas Lalani <maas@lalani.dev>
2024-07-03 15:36:11 -07:00
Daniel Hiltgen 8072e205ff
Merge pull request #5447 from dhiltgen/fix_keepalive
Only set default keep_alive on initial model load
2024-07-03 15:34:38 -07:00
Daniel Hiltgen 955f2a4e03 Only set default keep_alive on initial model load
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Daniel Hiltgen 3c75113e37 Prevent loading models larger than total memory
Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory.  Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.
2024-07-03 14:47:42 -07:00
Daniel Hiltgen ccd7785859
Merge pull request #5243 from dhiltgen/modelfile_use_mmap
Fix use_mmap for modefiles
2024-07-03 13:59:42 -07:00
royjhan 3b5a4a77f3
Return Correct Prompt Eval Count Regardless of Cache Prompt (#5371)
* openai compatibility

* Revert "openai compatibility"

This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c.

* remove erroneous subtraction of prompt cache
2024-07-03 13:46:23 -07:00