Commit graph

2917 commits

Author SHA1 Message Date
Daniel Hiltgen 434dfe30c5 Reintroduce nvidia nvml library for windows
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 4e2b7e181d Refactor intel gpu discovery 2024-06-14 14:51:40 -07:00
Daniel Hiltgen 48702dd149 Harden unload for empty runners 2024-06-14 14:51:40 -07:00
Daniel Hiltgen 68dfc6236a refined test timing
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 5e8ff556cb Support forced spreading for multi GPU
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one.  This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 206797bda4 Fix concurrency integration test to work locally
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 43ed358f9a Refine GPU discovery to bootstrap once
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen b32ebb4f29 Use DRM driver for VRAM info for amd
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen fb9cdfa723 Fix server.cpp for the new cuda build macros 2024-06-14 14:51:40 -07:00
Daniel Hiltgen efac488675 Revert "Limit GPU lib search for now (#4777)"
This reverts commit 476fb8e892.
2024-06-14 14:51:40 -07:00
Patrick Devine 4dc7fb9525
update 40xx gpu compat matrix (#5036) 2024-06-13 17:10:33 -07:00
Daniel Hiltgen c39761c552
Merge pull request #5032 from dhiltgen/actually_skip
Actually skip PhysX on windows
2024-06-13 13:26:09 -07:00
Daniel Hiltgen aac367636d Actually skip PhysX on windows 2024-06-13 13:17:19 -07:00
Michael Yang 15a687ae4b
Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16
fix: multibyte utf16
2024-06-13 13:14:55 -07:00
Michael Yang d528e1af75 fix utf16 for multibyte runes 2024-06-13 13:07:42 -07:00
Michael Yang cd234ce22c parser: add test for multibyte runes 2024-06-13 13:07:42 -07:00
Patrick Devine 94618b2365
add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
Jeffrey Morgan 1fd236d177
server: remove jwt decoding error (#5027) 2024-06-13 11:21:15 -07:00
Michael Yang e87fc7200d
Merge pull request #5025 from ollama/mxyng/revert-parser-scan
Revert "proper utf16 support"
2024-06-13 10:31:25 -07:00
Michael Yang 20b9f8e6f4 Revert "proper utf16 support"
This reverts commit 66ab48772f.

this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
Patrick Devine c69bc19e46
move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00
Michael Yang bba5d177aa
Merge pull request #5004 from ollama/mxyng/fix-templates
fix: multiple templates when creating from model
2024-06-12 14:39:29 -07:00
Michael Yang c16f8af911 fix: multiple templates when creating from model
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang 217f60c3d9
Merge pull request #4987 from ollama/mxyng/revert-byte-order
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
2024-06-11 16:04:20 -07:00
Michael Yang 7bdcd1da94 Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
This reverts commit f5f245cc15, reversing
changes made to 94d37fdcae.

this change broke gguf v2 which is incorrectly detected as big endian
2024-06-11 15:56:17 -07:00
Jeffrey Morgan ead259d877
llm: fix seed value not being applied to requests (#4986) 2024-06-11 14:24:41 -07:00
James Montgomery 2ff45d571d
Add Ollama-hpp to Community Libraries in README. (#4983) 2024-06-11 11:15:05 -07:00
Michael Yang 0f3cf1d42e
Merge pull request #4715 from ollama/mxyng/utf16-parser
proper utf16 support
2024-06-10 11:41:29 -07:00
Michael Yang 5bc029c529
Merge pull request #4921 from ollama/mxyng/import-md
update import.md
2024-06-10 11:41:09 -07:00
Michael Yang e9a9c6a8e8
Merge pull request #4965 from ollama/mxyng/skip-layer-remove
fix: skip removing layers that no longer exist
2024-06-10 11:40:03 -07:00
Michael Yang 515f497e6d fix: skip removing layers that no longer exist 2024-06-10 11:32:19 -07:00
Michael Yang b27268aaef add test 2024-06-10 11:32:15 -07:00
Michael Yang f5f245cc15
Merge pull request #4938 from ollama/mxyng/fix-byte-order
fix parsing big endian gguf
2024-06-10 09:38:12 -07:00
Jim Scardelis 94d37fdcae
fix: examples/langchain-python-rag-privategpt/requirements.txt (#3382) 2024-06-09 10:58:09 -07:00
Craig Hughes b84aea1685
Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782) 2024-06-09 10:57:09 -07:00
Napuh 896495de7b
Add instructions to easily install specific versions on faq.md (#4084)
* Added instructions to easily install specific versions on faq.md

* Small typo

* Moved instructions on how to install specific version to linux.md

* Update docs/linux.md

* Update docs/linux.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-06-09 10:49:03 -07:00
dcasota 5528dd9d11
Error handling load_single_document() in ingest.py (#4852)
load_single_document() handles
- corrupt files
- empty (zero byte) files
- unsupported file extensions
2024-06-09 10:41:07 -07:00
Jeffrey Morgan 943172cbf4
Update api.md 2024-06-08 23:04:32 -07:00
Nischal Jain 85169e8d6f
Added headless-ollama (#4612) 2024-06-08 18:51:16 -07:00
Jeffrey Morgan 34f142797a
llm: always add bos token to prompt (#4941)
* fix embedding by adding fixes from llama.cpp upstream

* remove assert

---------

Co-authored-by: Jesper Ek <deadbeef84@gmail.com>
2024-06-08 18:47:10 -07:00
Erhan 46a7f1e74a
Update README.md with LangChainRust (#4854) 2024-06-08 17:29:36 -07:00
Michael Yang 620d5c569e fix parsing big endian gguf 2024-06-08 12:35:26 -07:00
Michael Yang b9ce7bf75e update import.md 2024-06-07 16:45:15 -07:00
Daniel Hiltgen cddc63381c
Merge pull request #4909 from dhiltgen/oneapi_disable
Add ability to skip oneapi generate
2024-06-07 14:07:15 -07:00
Michael Yang 385a32ecb5
Merge pull request #4910 from ollama/mxyng/detect-chat-template
fix create model when template detection errors
2024-06-07 11:07:39 -07:00
Michael Yang 030e765e76 fix create model when template detection errors 2024-06-07 10:51:35 -07:00
Daniel Hiltgen ab8c929e20 Add ability to skip oneapi generate
This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries
2024-06-07 08:32:49 -07:00
Jeffrey Morgan ce0dc33cb8
llm: patch to fix qwen 2 temporarily on nvidia (#4897) 2024-06-06 23:14:33 -07:00
Michael Yang 78f81fc0e5
Merge pull request #4800 from ollama/mxyng/detect-chat-template
detect chat template from KV
2024-06-06 16:17:18 -07:00