ollama

Author	SHA1	Message	Date
Michael Yang	548a7df014	update list handler to use model.Name	2024-05-07 09:38:45 -07:00
Michael Yang	b2f00aa977	close zip files	2024-05-06 15:27:19 -07:00
Michael Yang	6694be5e50	convert/llama: use WriteSeeker	2024-05-06 15:24:01 -07:00
Michael Yang	f5e8b207fb	s/DisplayLongest/String/	2024-05-06 15:24:01 -07:00
Michael Yang	d245460362	only quantize language models	2024-05-06 15:24:01 -07:00
Michael Yang	4d0d0fa383	no iterator	2024-05-06 15:24:01 -07:00
Michael Yang	7ffe45734d	rebase	2024-05-06 15:24:01 -07:00
Michael Yang	01811c176a	comments	2024-05-06 15:24:01 -07:00
Michael Yang	a7248f6ea8	update tests	2024-05-06 15:24:01 -07:00
Michael Yang	9685c34509	quantize any fp16/fp32 model - FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}	2024-05-06 15:24:01 -07:00
Jeffrey Chen	d091fe3c21	Windows automatically recognizes username (#3214 )	2024-05-06 15:03:14 -07:00
Mohamed A. Fouad	ee02f548c8	Update linux.md (#3847 ) Add -e to viewing logs in order to show end of ollama logs	2024-05-06 15:02:25 -07:00
Daniel Hiltgen	b08870aff3	Merge pull request #4188 from dhiltgen/use_our_lib User our bundled libraries (cuda) instead of the host library	2024-05-06 14:41:05 -07:00
Darinka	3ecae420ac	Update api.md (#3945 ) * Update api.md Changed the calculation of tps (token/s) in the documentation * Update docs/api.md --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-05-06 14:39:58 -07:00
Daniel Hiltgen	4cbbf0e13b	Merge pull request #4090 from dhiltgen/rocm_paths Support Fedoras standard ROCm location	2024-05-06 14:33:41 -07:00
Daniel Hiltgen	380378cc80	Use our libraries first Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly	2024-05-06 14:23:29 -07:00
Daniel Hiltgen	0963c65027	Merge pull request #4208 from dhiltgen/fix_sched_test Fix stale test logic	2024-05-06 14:23:12 -07:00
Jeffrey Morgan	ed740a2504	Fix `no slots available` error with concurrent requests (#4160 )	2024-05-06 14:22:53 -07:00
Jeffrey Morgan	c9f98622b1	Skip scheduling cancelled requests, always reload unloaded runners (#4189 )	2024-05-06 14:22:24 -07:00
Daniel Hiltgen	0a954e5066	Fix stale test logic The model processing was recently changed to be deferred but this test scenario hadn't been adjusted for that change in behavior.	2024-05-06 14:15:37 -07:00
Adrien Brault	aa93423fbf	docs: pbcopy on mac (#3129 )	2024-05-06 13:47:00 -07:00
Nurgo	01c9386267	Add BrainSoup to compatible clients list (#3473 )	2024-05-06 13:42:16 -07:00
Daniel Hiltgen	af9eb36f9f	Merge pull request #4135 from dhiltgen/no_physx Skip PhysX cudart library	2024-05-06 13:34:00 -07:00
Daniel Hiltgen	06093fd396	Merge pull request #4067 from dhiltgen/cudart Add CUDA Driver API for GPU discovery	2024-05-06 13:30:27 -07:00
Tony Loehr	86b7fcac32	Update README.md with StreamDeploy (#3621 ) Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2024-05-06 11:14:41 -07:00
Hyden Liu	fb8ddc564e	chore: delete `HEAD` (#4194 )	2024-05-06 10:32:30 -07:00
Saif	242efe6611	👌 IMPROVE: add portkey library for production tools (#4119 )	2024-05-06 10:25:23 -07:00
Jeffrey Morgan	1b0e6c9c0e	Fix llava models not working after first request (#4164 ) * fix llava models not working after first request * individual requests only for llava models	2024-05-05 20:50:31 -07:00
Jeffrey Morgan	dfa2f32ca0	unload in critical section (#4187 )	2024-05-05 17:18:27 -07:00
Daniel Hiltgen	840424a2c4	Merge pull request #4154 from dhiltgen/central_config Centralize server config handling	2024-05-05 17:08:26 -07:00
Daniel Hiltgen	f56aa20014	Centralize server config handling This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs	2024-05-05 16:49:50 -07:00
alwqx	6707768ebd	chore: format go code (#4149 )	2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING	c78bb76a12	update libraries for langchain_community + llama3 changed from llama2 (#4174 )	2024-05-05 16:07:04 -07:00
Jeffrey Morgan	942c979232	allocate a large enough kv cache for all parallel requests (#4162 )	2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning	06164911dd	Update README.md (#4111 ) --------- Co-authored-by: Patrick Devine <patrick@infrahq.com>	2024-05-05 14:45:32 -07:00
Patrick Devine	2a21363bb7	validate the format of the digest when getting the model path (#4175 )	2024-05-05 11:46:12 -07:00
Daniel Hiltgen	026869915f	Merge pull request #4144 from dhiltgen/max_queue Make maximum pending request configurable	2024-05-05 10:53:44 -07:00
Daniel Hiltgen	45d61aaaa3	Add integration test to push max queue limits	2024-05-05 10:46:25 -07:00
Daniel Hiltgen	20f6c06569	Make maximum pending request configurable This also bumps up the default to be 50 queued requests instead of 10.	2024-05-04 21:00:52 -07:00
Daniel Hiltgen	371f5e52aa	Merge pull request #4141 from dhiltgen/win_docs Explain the 2 different windows download options	2024-05-04 12:50:16 -07:00
Daniel Hiltgen	e006480e49	Explain the 2 different windows download options	2024-05-04 12:50:05 -07:00
Michael Yang	aed545872d	Merge pull request #4143 from ollama/mxyng/final-response omit prompt and generate settings from final response	2024-05-03 17:39:49 -07:00
Michael Yang	44869c59d6	omit prompt and generate settings from final response	2024-05-03 17:00:02 -07:00
Daniel Hiltgen	52663284cf	Merge pull request #4145 from dhiltgen/fix_lint Fix lint warnings	2024-05-03 16:53:17 -07:00
Daniel Hiltgen	42fa9d7f0a	Fix lint warnings	2024-05-03 16:44:19 -07:00
Michael Yang	b7a87a22b6	Merge pull request #4059 from ollama/mxyng/parser-2 rename parser to model/file	2024-05-03 13:01:22 -07:00
Dr Nic Williams	e8aaea030e	Update 'llama2' -> 'llama3' in most places (#4116 ) * Update 'llama2' -> 'llama3' in most places --------- Co-authored-by: Patrick Devine <patrick@infrahq.com>	2024-05-03 15:25:04 -04:00
Daniel Hiltgen	b1ad3a43cb	Skip PhysX cudart library For some reason this library gives incorrect GPU information, so skip it	2024-05-03 11:55:32 -07:00
Daniel Hiltgen	267e25a750	Merge pull request #4129 from dhiltgen/unit_tests Soften timeouts on sched unit tests	2024-05-03 11:10:26 -07:00
Daniel Hiltgen	9a32c514cb	Soften timeouts on sched unit tests This gives us more headroom on the scheduler tests to tamp down some flakes.	2024-05-03 09:08:33 -07:00

1 2 3 4 5 ...

2614 commits