Commit graph

2614 commits

Author SHA1 Message Date
Michael Yang 548a7df014 update list handler to use model.Name 2024-05-07 09:38:45 -07:00
Michael Yang b2f00aa977 close zip files 2024-05-06 15:27:19 -07:00
Michael Yang 6694be5e50 convert/llama: use WriteSeeker 2024-05-06 15:24:01 -07:00
Michael Yang f5e8b207fb s/DisplayLongest/String/ 2024-05-06 15:24:01 -07:00
Michael Yang d245460362 only quantize language models 2024-05-06 15:24:01 -07:00
Michael Yang 4d0d0fa383 no iterator 2024-05-06 15:24:01 -07:00
Michael Yang 7ffe45734d rebase 2024-05-06 15:24:01 -07:00
Michael Yang 01811c176a comments 2024-05-06 15:24:01 -07:00
Michael Yang a7248f6ea8 update tests 2024-05-06 15:24:01 -07:00
Michael Yang 9685c34509 quantize any fp16/fp32 model
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Jeffrey Chen d091fe3c21
Windows automatically recognizes username (#3214) 2024-05-06 15:03:14 -07:00
Mohamed A. Fouad ee02f548c8
Update linux.md (#3847)
Add -e to viewing logs in order to show end of ollama logs
2024-05-06 15:02:25 -07:00
Daniel Hiltgen b08870aff3
Merge pull request #4188 from dhiltgen/use_our_lib
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00
Darinka 3ecae420ac
Update api.md (#3945)
* Update api.md

Changed the calculation of tps (token/s) in the documentation

* Update docs/api.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-06 14:39:58 -07:00
Daniel Hiltgen 4cbbf0e13b
Merge pull request #4090 from dhiltgen/rocm_paths
Support Fedoras standard ROCm location
2024-05-06 14:33:41 -07:00
Daniel Hiltgen 380378cc80 Use our libraries first
Trying to live off the land for cuda libraries was not the right strategy.  We need to use the version we compiled against to ensure things work properly
2024-05-06 14:23:29 -07:00
Daniel Hiltgen 0963c65027
Merge pull request #4208 from dhiltgen/fix_sched_test
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan ed740a2504
Fix no slots available error with concurrent requests (#4160) 2024-05-06 14:22:53 -07:00
Jeffrey Morgan c9f98622b1
Skip scheduling cancelled requests, always reload unloaded runners (#4189) 2024-05-06 14:22:24 -07:00
Daniel Hiltgen 0a954e5066 Fix stale test logic
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Adrien Brault aa93423fbf
docs: pbcopy on mac (#3129) 2024-05-06 13:47:00 -07:00
Nurgo 01c9386267
Add BrainSoup to compatible clients list (#3473) 2024-05-06 13:42:16 -07:00
Daniel Hiltgen af9eb36f9f
Merge pull request #4135 from dhiltgen/no_physx
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen 06093fd396
Merge pull request #4067 from dhiltgen/cudart
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Tony Loehr 86b7fcac32
Update README.md with StreamDeploy (#3621)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-06 11:14:41 -07:00
Hyden Liu fb8ddc564e
chore: delete HEAD (#4194) 2024-05-06 10:32:30 -07:00
Saif 242efe6611
👌 IMPROVE: add portkey library for production tools (#4119) 2024-05-06 10:25:23 -07:00
Jeffrey Morgan 1b0e6c9c0e
Fix llava models not working after first request (#4164)
* fix llava models not working after first request

* individual requests only for llava models
2024-05-05 20:50:31 -07:00
Jeffrey Morgan dfa2f32ca0
unload in critical section (#4187) 2024-05-05 17:18:27 -07:00
Daniel Hiltgen 840424a2c4
Merge pull request #4154 from dhiltgen/central_config
Centralize server config handling
2024-05-05 17:08:26 -07:00
Daniel Hiltgen f56aa20014 Centralize server config handling
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
alwqx 6707768ebd
chore: format go code (#4149) 2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING c78bb76a12
update libraries for langchain_community + llama3 changed from llama2 (#4174) 2024-05-05 16:07:04 -07:00
Jeffrey Morgan 942c979232
allocate a large enough kv cache for all parallel requests (#4162) 2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning 06164911dd
Update README.md (#4111)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-05 14:45:32 -07:00
Patrick Devine 2a21363bb7
validate the format of the digest when getting the model path (#4175) 2024-05-05 11:46:12 -07:00
Daniel Hiltgen 026869915f
Merge pull request #4144 from dhiltgen/max_queue
Make maximum pending request configurable
2024-05-05 10:53:44 -07:00
Daniel Hiltgen 45d61aaaa3 Add integration test to push max queue limits 2024-05-05 10:46:25 -07:00
Daniel Hiltgen 20f6c06569 Make maximum pending request configurable
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Daniel Hiltgen 371f5e52aa
Merge pull request #4141 from dhiltgen/win_docs
Explain the 2 different windows download options
2024-05-04 12:50:16 -07:00
Daniel Hiltgen e006480e49 Explain the 2 different windows download options 2024-05-04 12:50:05 -07:00
Michael Yang aed545872d
Merge pull request #4143 from ollama/mxyng/final-response
omit prompt and generate settings from final response
2024-05-03 17:39:49 -07:00
Michael Yang 44869c59d6 omit prompt and generate settings from final response 2024-05-03 17:00:02 -07:00
Daniel Hiltgen 52663284cf
Merge pull request #4145 from dhiltgen/fix_lint
Fix lint warnings
2024-05-03 16:53:17 -07:00
Daniel Hiltgen 42fa9d7f0a Fix lint warnings 2024-05-03 16:44:19 -07:00
Michael Yang b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
rename parser to model/file
2024-05-03 13:01:22 -07:00
Dr Nic Williams e8aaea030e
Update 'llama2' -> 'llama3' in most places (#4116)
* Update 'llama2' -> 'llama3' in most places

---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-03 15:25:04 -04:00
Daniel Hiltgen b1ad3a43cb Skip PhysX cudart library
For some reason this library gives incorrect GPU information, so skip it
2024-05-03 11:55:32 -07:00
Daniel Hiltgen 267e25a750
Merge pull request #4129 from dhiltgen/unit_tests
Soften timeouts on sched unit tests
2024-05-03 11:10:26 -07:00
Daniel Hiltgen 9a32c514cb Soften timeouts on sched unit tests
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00