Commit graph

2925 commits

Author SHA1 Message Date
Daniel Hiltgen 26ab67732b Bump ROCm linux to 6.1.1 2024-06-14 15:37:54 -07:00
Daniel Hiltgen 45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
Daniel Hiltgen 17df6520c8 Remove mmap related output calc logic 2024-06-14 14:55:50 -07:00
Daniel Hiltgen 6f351bf586 review comments and coverage 2024-06-14 14:55:50 -07:00
Daniel Hiltgen ff4f0cbd1d Prevent multiple concurrent loads on the same gpus
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen fc37c192ae Refine CPU load behavior with system memory visibility 2024-06-14 14:51:40 -07:00
Daniel Hiltgen 434dfe30c5 Reintroduce nvidia nvml library for windows
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 4e2b7e181d Refactor intel gpu discovery 2024-06-14 14:51:40 -07:00
Daniel Hiltgen 48702dd149 Harden unload for empty runners 2024-06-14 14:51:40 -07:00
Daniel Hiltgen 68dfc6236a refined test timing
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 5e8ff556cb Support forced spreading for multi GPU
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one.  This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 206797bda4 Fix concurrency integration test to work locally
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 43ed358f9a Refine GPU discovery to bootstrap once
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen b32ebb4f29 Use DRM driver for VRAM info for amd
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen fb9cdfa723 Fix server.cpp for the new cuda build macros 2024-06-14 14:51:40 -07:00
Daniel Hiltgen efac488675 Revert "Limit GPU lib search for now (#4777)"
This reverts commit 476fb8e892.
2024-06-14 14:51:40 -07:00
Jeffrey Morgan 6b800aa7b7
openai: do not set temperature to 0 when setting seed (#5045) 2024-06-14 13:43:56 -07:00
Jeffrey Morgan dd7c9ebeaf
server: longer timeout in TestRequests (#5046) 2024-06-14 09:48:25 -07:00
Patrick Devine 4dc7fb9525
update 40xx gpu compat matrix (#5036) 2024-06-13 17:10:33 -07:00
Daniel Hiltgen c39761c552
Merge pull request #5032 from dhiltgen/actually_skip
Actually skip PhysX on windows
2024-06-13 13:26:09 -07:00
Daniel Hiltgen aac367636d Actually skip PhysX on windows 2024-06-13 13:17:19 -07:00
Michael Yang 15a687ae4b
Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16
fix: multibyte utf16
2024-06-13 13:14:55 -07:00
Michael Yang d528e1af75 fix utf16 for multibyte runes 2024-06-13 13:07:42 -07:00
Michael Yang cd234ce22c parser: add test for multibyte runes 2024-06-13 13:07:42 -07:00
Patrick Devine 94618b2365
add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
Jeffrey Morgan 1fd236d177
server: remove jwt decoding error (#5027) 2024-06-13 11:21:15 -07:00
Michael Yang e87fc7200d
Merge pull request #5025 from ollama/mxyng/revert-parser-scan
Revert "proper utf16 support"
2024-06-13 10:31:25 -07:00
Michael Yang 20b9f8e6f4 Revert "proper utf16 support"
This reverts commit 66ab48772f.

this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
Patrick Devine c69bc19e46
move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00
Michael Yang bba5d177aa
Merge pull request #5004 from ollama/mxyng/fix-templates
fix: multiple templates when creating from model
2024-06-12 14:39:29 -07:00
Michael Yang c16f8af911 fix: multiple templates when creating from model
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang 217f60c3d9
Merge pull request #4987 from ollama/mxyng/revert-byte-order
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
2024-06-11 16:04:20 -07:00
Michael Yang 7bdcd1da94 Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
This reverts commit f5f245cc15, reversing
changes made to 94d37fdcae.

this change broke gguf v2 which is incorrectly detected as big endian
2024-06-11 15:56:17 -07:00
Jeffrey Morgan ead259d877
llm: fix seed value not being applied to requests (#4986) 2024-06-11 14:24:41 -07:00
James Montgomery 2ff45d571d
Add Ollama-hpp to Community Libraries in README. (#4983) 2024-06-11 11:15:05 -07:00
Michael Yang 0f3cf1d42e
Merge pull request #4715 from ollama/mxyng/utf16-parser
proper utf16 support
2024-06-10 11:41:29 -07:00
Michael Yang 5bc029c529
Merge pull request #4921 from ollama/mxyng/import-md
update import.md
2024-06-10 11:41:09 -07:00
Michael Yang e9a9c6a8e8
Merge pull request #4965 from ollama/mxyng/skip-layer-remove
fix: skip removing layers that no longer exist
2024-06-10 11:40:03 -07:00
Michael Yang 515f497e6d fix: skip removing layers that no longer exist 2024-06-10 11:32:19 -07:00
Michael Yang b27268aaef add test 2024-06-10 11:32:15 -07:00
Michael Yang f5f245cc15
Merge pull request #4938 from ollama/mxyng/fix-byte-order
fix parsing big endian gguf
2024-06-10 09:38:12 -07:00
Jim Scardelis 94d37fdcae
fix: examples/langchain-python-rag-privategpt/requirements.txt (#3382) 2024-06-09 10:58:09 -07:00
Craig Hughes b84aea1685
Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782) 2024-06-09 10:57:09 -07:00
Napuh 896495de7b
Add instructions to easily install specific versions on faq.md (#4084)
* Added instructions to easily install specific versions on faq.md

* Small typo

* Moved instructions on how to install specific version to linux.md

* Update docs/linux.md

* Update docs/linux.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-06-09 10:49:03 -07:00
dcasota 5528dd9d11
Error handling load_single_document() in ingest.py (#4852)
load_single_document() handles
- corrupt files
- empty (zero byte) files
- unsupported file extensions
2024-06-09 10:41:07 -07:00
Jeffrey Morgan 943172cbf4
Update api.md 2024-06-08 23:04:32 -07:00
Nischal Jain 85169e8d6f
Added headless-ollama (#4612) 2024-06-08 18:51:16 -07:00
Jeffrey Morgan 34f142797a
llm: always add bos token to prompt (#4941)
* fix embedding by adding fixes from llama.cpp upstream

* remove assert

---------

Co-authored-by: Jesper Ek <deadbeef84@gmail.com>
2024-06-08 18:47:10 -07:00
Erhan 46a7f1e74a
Update README.md with LangChainRust (#4854) 2024-06-08 17:29:36 -07:00