Commit graph

2917 commits

Author SHA1 Message Date
Josh 7607e6e902
Merge pull request #4379 from WolfTheDeveloper/main
Update `LlamaScript` to point to new link from Legacy link.
2024-05-13 18:08:32 -07:00
Patrick Devine f1548ef62d
update the FAQ to be more clear about windows env variables (#4415) 2024-05-13 18:01:13 -07:00
Patrick Devine 6845988807
Ollama ps command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00
Josh 9eed4a90ce
Merge pull request #4411 from joshyan1/main
removed inconsistent punctuation
2024-05-13 15:30:45 -07:00
Josh Yan f8464785a6 removed inconsistencies 2024-05-13 14:50:52 -07:00
Michael Yang 1d359e737e typo 2024-05-13 14:18:34 -07:00
Michael Yang 50b9056e09 count memory up to NumGPU 2024-05-13 14:13:10 -07:00
Josh Yan 91a090a485 removed inconsistent punctuation 2024-05-13 14:08:22 -07:00
睡觉型学渣 9c76b30d72
Correct typos. (#4387)
* Correct typos.

* Correct typos.
2024-05-12 18:21:11 -07:00
Zander Lewis 93f19910c5
Update LlamaScript to point to new link.
Still used Legacy link.
2024-05-12 11:24:21 -04:00
jmorganca 4ec7445a6f Revert "use post token"
This reverts commit 0fec3525ad.
2024-05-11 22:19:14 -07:00
Michael Yang 0372c51f82
Merge pull request #4369 from ollama/mxyng/post-token
use post token
2024-05-11 19:29:14 -07:00
Michael Yang 0fec3525ad use post token 2024-05-11 19:13:16 -07:00
Jeffrey Morgan 41ba3017fd
Fix OpenAI finish_reason values when empty (#4368) 2024-05-11 15:31:41 -07:00
todashuta 8080fbce35
fix ollama create's usage string (#4362) 2024-05-11 14:47:49 -07:00
Michael Yang ec14f6ceda
case sensitive filepaths (#4366) 2024-05-11 14:12:36 -07:00
Daniel Hiltgen c60a086635
Merge pull request #4331 from dhiltgen/fix_unit
Fix envconfig unit test
2024-05-11 09:16:28 -07:00
jmorganca 92ca2cca95 Revert "only forward some env vars"
This reverts commit ce3b212d12.
2024-05-10 22:53:21 -07:00
Patrick Devine 1e1634daca
update go deps (#4324) 2024-05-10 21:39:27 -07:00
Daniel Hiltgen 824ee5446f Fix envconfig unit test 2024-05-10 16:49:48 -07:00
Daniel Hiltgen 879e2caf8c
Merge pull request #4329 from dhiltgen/zero_layers
Fall back to CPU runner with zero layers
2024-05-10 15:23:16 -07:00
Daniel Hiltgen c4014e73a2 Fall back to CPU runner with zero layers 2024-05-10 15:09:48 -07:00
Daniel Hiltgen be9efdb981
Merge pull request #4326 from dhiltgen/fix_integration
Integration fixes
2024-05-10 14:25:59 -07:00
Daniel Hiltgen 074dc3b9d8 Integration fixes 2024-05-10 14:20:10 -07:00
Daniel Hiltgen 86f9b582d5
Merge pull request #4323 from dhiltgen/sort_by_free
Always use the sorted list of GPUs
2024-05-10 14:12:15 -07:00
Daniel Hiltgen 4142c3ef7c Always use the sorted list of GPUs
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan 6602e793c0
Use --quantize flag and quantize api parameter (#4321)
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go

Co-authored-by: Michael Yang <mxyng@pm.me>

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Michael Yang ea0fdaed28
Merge pull request #4320 from ollama/mxyng/phi2-mem
add phi2 mem
2024-05-10 12:35:08 -07:00
Michael Yang 1eb382da5a add phi2 mem 2024-05-10 12:13:28 -07:00
Jeffrey Morgan bb6fd02298
Don't clamp ctx size in PredictServerFit (#4317)
* dont clamp ctx size in `PredictServerFit`

* minimum 4 context

* remove context warning
2024-05-10 10:17:12 -07:00
Daniel Hiltgen 7e2bceceee
Merge pull request #4316 from dhiltgen/more_buffer
Bump VRAM buffer back up
2024-05-10 10:02:34 -07:00
Daniel Hiltgen 30a7d7096c Bump VRAM buffer back up
Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.
2024-05-10 09:15:28 -07:00
Michael Yang 200a18820e
Merge pull request #4306 from ollama/mxyng/fix-routes 2024-05-10 08:58:16 -07:00
Michael Yang e03637176d fix(routes): skip bad manifests 2024-05-10 08:46:11 -07:00
Bruce MacDonald c02db93243 omit empty done reason 2024-05-09 16:45:29 -07:00
Michael Yang ffa4d5134a
Merge pull request #4305 from ollama/mxyng/typo
fix typo
2024-05-09 16:42:09 -07:00
Jeffrey Morgan 302d7fdbf3
prune partial downloads (#4272) 2024-05-09 16:35:20 -07:00
Michael Yang cf442cd57e fix typo 2024-05-09 16:23:37 -07:00
Michael Yang 0e1ba65855
Merge pull request #4302 from ollama/mxyng/forward-env
only forward some env vars
2024-05-09 16:21:05 -07:00
Michael Yang 6aad333c63
Merge pull request #4298 from ollama/mxyng/log-cleanup
log clean up
2024-05-09 16:20:57 -07:00
Daniel Hiltgen 4fcc84e67a
Merge pull request #4304 from dhiltgen/signals
Fix race in shutdown logic
2024-05-09 15:58:44 -07:00
Daniel Hiltgen 3ae2f441e0 Fix race in shutdown logic
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Zander Lewis 2abb3f6424
Update README.md (#4300) 2024-05-09 15:30:49 -07:00
Michael Yang ce3b212d12 only forward some env vars 2024-05-09 15:16:09 -07:00
Daniel Hiltgen 83d6d46e29
Merge pull request #4299 from dhiltgen/handle_vram_reporting_lag
Wait for GPU free memory reporting to converge
2024-05-09 15:08:56 -07:00
Daniel Hiltgen 354ad9254e Wait for GPU free memory reporting to converge
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Michael Yang 58876091f7 log clean up 2024-05-09 14:55:36 -07:00
Daniel Hiltgen dc18eee39d
Merge pull request #4238 from dhiltgen/gpu_info
Record more GPU information
2024-05-09 14:26:58 -07:00
Daniel Hiltgen 8727a9c140 Record more GPU information
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Daniel Hiltgen d0425f26cf
Merge pull request #4294 from dhiltgen/harden_subprocess_reaping
Harden subprocess reaping
2024-05-09 14:02:16 -07:00