Commit graph

544 commits

Author SHA1 Message Date
Patrick Devine f2cf97d6f1
fix typo in modelfile generation (#4439) 2024-05-14 15:34:29 -07:00
Ryo Machida 798b107f19
Fixed the API endpoint /api/tags when the model list is empty. (#4424)
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.

* Update server/routes.go

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Daniel Hiltgen ec231a7923 Remove VRAM convergence check for windows
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
Patrick Devine 7ca71a6b0f
don't abort when an invalid model name is used in /save (#4416) 2024-05-13 18:48:28 -07:00
Patrick Devine 6845988807
Ollama ps command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00
jmorganca 4ec7445a6f Revert "use post token"
This reverts commit 0fec3525ad.
2024-05-11 22:19:14 -07:00
Michael Yang 0fec3525ad use post token 2024-05-11 19:13:16 -07:00
Daniel Hiltgen 824ee5446f Fix envconfig unit test 2024-05-10 16:49:48 -07:00
Daniel Hiltgen 4142c3ef7c Always use the sorted list of GPUs
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan 6602e793c0
Use --quantize flag and quantize api parameter (#4321)
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go

Co-authored-by: Michael Yang <mxyng@pm.me>

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Jeffrey Morgan bb6fd02298
Don't clamp ctx size in PredictServerFit (#4317)
* dont clamp ctx size in `PredictServerFit`

* minimum 4 context

* remove context warning
2024-05-10 10:17:12 -07:00
Michael Yang e03637176d fix(routes): skip bad manifests 2024-05-10 08:46:11 -07:00
Jeffrey Morgan 302d7fdbf3
prune partial downloads (#4272) 2024-05-09 16:35:20 -07:00
Daniel Hiltgen 3ae2f441e0 Fix race in shutdown logic
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Daniel Hiltgen 354ad9254e Wait for GPU free memory reporting to converge
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Daniel Hiltgen 8727a9c140 Record more GPU information
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Bruce MacDonald cfa84b8470
add done_reason to the api (#4235) 2024-05-09 13:30:14 -07:00
Michael Yang a7ee84fc31 routes: skip invalid filepaths 2024-05-09 11:23:22 -07:00
Jeffrey Morgan d5eec16d23
use model defaults for num_gqa, rope_frequency_base and rope_frequency_scale (#1983) 2024-05-09 09:06:13 -07:00
Bruce MacDonald cef45feaa4
Add preflight OPTIONS handling and update CORS config (#4086)
* Add preflight OPTIONS handling and update CORS config

- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.

- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.

* allow auth, content-type, and user-agent headers

* Update routes.go
2024-05-08 13:14:00 -07:00
Michael Yang b25976aeb8 routes: fix show llava models 2024-05-08 12:43:36 -07:00
Michael Yang 88cf154483
Merge pull request #4244 from ollama/mxyng/skip-if-same
skip if same quantization
2024-05-07 19:03:37 -07:00
Bruce MacDonald 8cbd3e7510
skip hidden files in list models handler (#4247) 2024-05-07 19:01:45 -07:00
Michael Yang eeb695261f skip if same quantization 2024-05-07 17:44:19 -07:00
Bruce MacDonald dc9b1111e0 fix invalid destination error message 2024-05-07 17:35:52 -07:00
Michael Yang ffbd3d173f
Merge pull request #3715 from ollama/mxyng/modelname-2
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang 1e0a669f75
Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Michael Yang 548a7df014 update list handler to use model.Name 2024-05-07 09:38:45 -07:00
Jeffrey Morgan 39d9d22ca3
close server on receiving signal (#4213) 2024-05-06 16:01:37 -07:00
Michael Yang b2f00aa977 close zip files 2024-05-06 15:27:19 -07:00
Michael Yang f5e8b207fb s/DisplayLongest/String/ 2024-05-06 15:24:01 -07:00
Michael Yang d245460362 only quantize language models 2024-05-06 15:24:01 -07:00
Michael Yang 4d0d0fa383 no iterator 2024-05-06 15:24:01 -07:00
Michael Yang 7ffe45734d rebase 2024-05-06 15:24:01 -07:00
Michael Yang 01811c176a comments 2024-05-06 15:24:01 -07:00
Michael Yang a7248f6ea8 update tests 2024-05-06 15:24:01 -07:00
Michael Yang 9685c34509 quantize any fp16/fp32 model
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Daniel Hiltgen 0963c65027
Merge pull request #4208 from dhiltgen/fix_sched_test
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan c9f98622b1
Skip scheduling cancelled requests, always reload unloaded runners (#4189) 2024-05-06 14:22:24 -07:00
Daniel Hiltgen 0a954e5066 Fix stale test logic
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Jeffrey Morgan dfa2f32ca0
unload in critical section (#4187) 2024-05-05 17:18:27 -07:00
Daniel Hiltgen f56aa20014 Centralize server config handling
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
Jeffrey Morgan 942c979232
allocate a large enough kv cache for all parallel requests (#4162) 2024-05-05 15:59:32 -07:00
Patrick Devine 2a21363bb7
validate the format of the digest when getting the model path (#4175) 2024-05-05 11:46:12 -07:00
Daniel Hiltgen 20f6c06569 Make maximum pending request configurable
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Michael Yang b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
rename parser to model/file
2024-05-03 13:01:22 -07:00
Daniel Hiltgen 9a32c514cb Soften timeouts on sched unit tests
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00
Michael Yang e9ae607ece
Merge pull request #3892 from ollama/mxyng/parser
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang 5b806d8d24
Merge pull request #4089 from ollama/mxyng/target-invalid
server: destination invalid
2024-05-01 12:46:35 -07:00
Michael Yang 45b6a12e45 server: target invalid 2024-05-01 12:40:45 -07:00