Commit graph

68 commits

Author SHA1 Message Date
Daniel Hiltgen 8711d03df7 Report errors on server lookup instead of path lookup failure 2024-04-22 19:08:47 -07:00
Daniel Hiltgen aa72281eae Trim spaces and quotes from llm lib override 2024-04-22 17:11:14 -07:00
ManniX-ITA c496967e56
Merge branch 'ollama:main' into mannix-server 2024-04-18 18:45:15 +02:00
Michael Yang 3cf483fe48 add stablelm graph calculation 2024-04-17 13:57:19 -07:00
Michael Yang a8b9b930b4 account for all non-repeating layers 2024-04-17 11:21:21 -07:00
ManniX-ITA bd54b08261
Streamlined WaitUntilRunning 2024-04-17 17:39:52 +02:00
Michael Yang 26df674785 scale graph based on gpu count 2024-04-16 14:44:13 -07:00
Michael Yang 41a272de9f darwin: no partial offloading if required memory greater than system 2024-04-16 11:22:38 -07:00
Jeffrey Morgan a0b8a32eb4
Terminate subprocess if receiving SIGINT or SIGTERM signals while model is loading (#3653)
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Michael Yang 7e33a017c0 partial offloading 2024-04-10 11:37:20 -07:00
Michael Yang 8b2c10061c refactor tensor query 2024-04-10 11:37:20 -07:00
Daniel Hiltgen c5ff443b9f Handle very slow model loads
During testing, we're seeing some models take over 3 minutes.
2024-04-09 16:35:10 -07:00
Michael Yang be517e491c no rope parameters 2024-04-05 18:05:27 -07:00
Michael Yang 12e923e158 update graph size estimate 2024-04-03 13:34:12 -07:00
Daniel Hiltgen 464d817824
Merge pull request #3464 from dhiltgen/subprocess
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Daniel Hiltgen 6589eb8a8c Revert options as a ref in the server 2024-04-02 16:44:10 -07:00
Michael Yang 80163ebcb5 fix metal gpu 2024-04-02 16:06:45 -07:00
Daniel Hiltgen 58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00