Commit graph

193 commits

Author SHA1 Message Date
Patrick Devine 910e9401d0
Multimodal support (#1216)
---------

Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
2023-12-11 13:56:22 -08:00
Michael Yang 56ffc3023a remove per-model types
mostly replaced by decoding tensors except ggml models which only
support llama
2023-12-11 09:40:21 -08:00
Jeffrey Morgan fa2f095bd9 fix model name returned by /api/generate being different than the model name provided 2023-12-10 11:42:15 -05:00
Jeffrey Morgan d9a250e9b5 seek to end of file when decoding older model formats 2023-12-09 21:14:35 -05:00
Jeffrey Morgan 944519ed16 seek to eof for older model binaries 2023-12-09 20:48:57 -05:00
Jeffrey Morgan 2dd040d04c do not use --parallel 2 for old runners 2023-12-09 20:17:33 -05:00
Bruce MacDonald bbe41ce41a
fix: parallel queueing race condition caused silent failure (#1445)
* fix: queued request failures

- increase parallel requests to 2 to complete queued request, queueing is managed in ollama

* log steam errors
2023-12-09 14:14:02 -05:00
Michael Yang f1b049fed8
Merge pull request #1377 from jmorganca/mxyng/qwen
update for qwen
2023-12-06 12:31:51 -08:00
Michael Yang b9495ea162 load projectors 2023-12-05 14:36:12 -08:00
Michael Yang 409bb9674e
Merge pull request #1308 from jmorganca/mxyng/split-from
split from into one or more models
2023-12-05 14:33:03 -08:00
Michael Yang d3479c07a1
Merge pull request #1250 from jmorganca/mxyng/create-layer
refactor layer creation
2023-12-05 14:32:52 -08:00
Bruce MacDonald 195e3d9dbd
chat api endpoint (#1392) 2023-12-05 14:57:33 -05:00
Jeffrey Morgan 00d06619a1 Revert "chat api (#991)" while context variable is fixed
This reverts commit 7a0899d62d.
2023-12-04 21:16:27 -08:00
Michael Yang 5a5dca13b2 comments 2023-12-04 16:59:23 -08:00
Michael Yang 72e7a49aa9 seek instead of copyn 2023-12-04 16:59:23 -08:00
Michael Yang 2cb0fa7d40 split from into one or more models 2023-12-04 16:59:23 -08:00
Michael Yang b2816bca67 unnecessary ReadSeeker for DecodeGGML 2023-12-04 16:59:23 -08:00
Bruce MacDonald 7a0899d62d
chat api (#991)
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Michael Yang 6deebf2489 update for qwen 2023-12-04 11:38:05 -08:00
Jeffrey Morgan 16a9006306 add back f16c instructions on intel mac 2023-11-26 15:59:49 -05:00
Jeffrey Morgan 9e4a316405 update submodule commit 2023-11-26 14:52:00 -05:00
Jing Zhang 82b9b329ff
windows CUDA support (#1262)
* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows
2023-11-24 17:16:36 -05:00
Jongwook Choi 12e8c12d2b
Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261)
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.
2023-11-24 14:05:57 -05:00
Jeffrey Morgan d77dde126b consistent cpu instructions on macos and linux 2023-11-22 16:26:46 -05:00
Michael Yang 199941cd15 fix: gguf int type 2023-11-22 11:40:30 -08:00
Michael Yang a00fac4ec8 update llama.cpp 2023-11-21 09:50:02 -08:00
Jeffrey Morgan a3fcecf943 only set main_gpu if value > 0 is provided 2023-11-20 19:54:04 -05:00
Michael Yang 19b7a4d715 recent llama.cpp update added kernels for fp32, q5_0, and q5_1 2023-11-20 13:44:31 -08:00
Purinda Gunasekara be61a81758
main-gpu argument is not getting passed to llamacpp, fixed. (#1192) 2023-11-20 10:52:52 -05:00
Jeffrey Morgan 13ba6df5ab enable cpu instructions on intel macs 2023-11-19 23:20:26 -05:00
Jeffrey Morgan 36a3bbf65f Update llm/llama.go 2023-11-18 21:25:07 -05:00
Bruce MacDonald 43a726149d fix potentially inaccurate error message 2023-11-18 21:25:07 -05:00
Jeffrey Morgan 41434a7cdc build intel mac with correct binary and compile flags 2023-11-16 22:14:51 -05:00
Jeffrey Morgan 5cba29b9d6
JSON mode: add `"format" as an api parameter (#1051)
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Bruce MacDonald 1ae84bc2a2
skip gpu if less than 2GB VRAM are available (#1059) 2023-11-09 13:16:16 -08:00
Michael Yang c5e1bbabda
instead of static number of parameters for each model family, get the real number from the tensors (#1022)
* parse tensor info

* refactor decoder

* return actual parameter count

* explicit rounding

* s/Human/HumanNumber/
2023-11-08 17:55:46 -08:00
Jeffrey Morgan c44b619428 remove unused fmt.Println 2023-11-03 17:24:58 -07:00
Jeffrey Morgan 17678b7225 Restore system prompt on requests and default num_keep to 0 2023-11-03 13:25:25 -07:00
Jeffrey Morgan 2e53704685
default rope params to 0 for new models (#968) 2023-11-02 08:41:30 -07:00
Michael Yang 642128b75a append LD_LIBRARY_PATH 2023-10-31 15:54:49 -07:00
Jeffrey Morgan 3a1ed9ff70
restore building runner with AVX on by default (#900) 2023-10-27 12:13:44 -07:00
Bruce MacDonald 6d283882b1
catch insufficient permissions nvidia err (#934) 2023-10-27 12:42:40 -04:00
Bruce MacDonald 2665f3c28e
offload 75% of available vram to improve stability (#921) 2023-10-26 20:49:55 -04:00
Jeffrey Morgan b0c9cd0f3b fix metal assertion errors 2023-10-24 00:32:36 -07:00
Jeffrey Morgan 77f61c6301 update submodule commit 2023-10-24 00:30:27 -07:00
Jeffrey Morgan f3604534e5 update submodule commit 2023-10-23 23:59:12 -07:00
Michael Yang 0c7a00a264 bump submodules
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since
438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
2023-10-23 11:17:59 -07:00
Michael Yang 36c160f1c3
Merge pull request #881 from jmorganca/mxyng/ggufv3
ggufv3
2023-10-23 10:50:45 -07:00
Michael Yang c9167494cb update default log target 2023-10-23 10:44:50 -07:00
Michael Yang 125d0a013a ggufv3
ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.

loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.
2023-10-23 09:35:49 -07:00
Jeffrey Morgan 7ed5a39bc7 simpler check for model loading compatibility errors 2023-10-19 14:50:49 -04:00
Jeffrey Morgan a7dad24d92
add error for falcon and starcoder vocab compatibility (#844)
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-19 12:18:31 -04:00
Michael Yang 235e43d7f6
Merge pull request #833 from discovertomorrow/leadingspace
Fix Issue with Leading Whitespaces in Decoded Context
2023-10-18 13:52:48 -07:00
Arne Müller 730996e530 use TrimPrefix instead of TrimLeft 2023-10-18 22:51:30 +02:00
Arne Müller ce6197a8e0 removed redundant strings.CutPrefix from Decode 2023-10-18 22:47:20 +02:00
Arne Müller 46b9953f32 use strings.TrimLeft to remove spaces 2023-10-18 22:41:19 +02:00
Bruce MacDonald 565648f3f7
relay CUDA errors to the client (#825) 2023-10-18 15:36:56 -04:00
Arne Müller 90c49bed57 moved removal of leading space into Predict 2023-10-18 20:08:26 +02:00
Arne Müller 5dc0cff459 fix whitespace removal 2023-10-18 08:15:27 +02:00
Michael Yang 08b0e04f40
Merge pull request #813 from jmorganca/mxyng/llama
refactor llm/llama.go
2023-10-17 14:05:58 -07:00
Michael Yang b36b0b71f8 use cut prefix 2023-10-17 14:01:39 -07:00
Michael Yang 094df37563 remove unused struct 2023-10-17 14:01:38 -07:00
Bruce MacDonald f3648fd206
Update llama.cpp gguf to latest (#710) 2023-10-17 16:55:16 -04:00
Bruce MacDonald bd93a94abd
fix MB VRAM log output (#824) 2023-10-17 15:35:16 -04:00
Michael Yang f55bdb6f10
Merge pull request #799 from deichbewohner/jsonmarshaling
Fix JSON Marshal Escaping for Special Characters
2023-10-17 08:46:02 -07:00
Michael Yang 2870a9bfc8
Merge pull request #812 from jmorganca/mxyng/fix-format-string
fix: wrong format string type
2023-10-17 08:40:49 -07:00
Arne Müller 8fa3f366ad Removed newline trimming and used buffer directly in POST request. 2023-10-17 08:17:35 +02:00
Michael Yang fddb303f23 fix: format string wrong type 2023-10-16 16:14:28 -07:00
Michael Yang cb4a80b693 fix: regression unsupported metal types
omitting `--n-gpu-layers` means use metal on macos which isn't correct
since ollama uses `num_gpu=0` to explicitly disable gpu for file types
that are not implemented in metal
2023-10-16 14:37:20 -07:00
Arne Müller ee94693b1a handling unescaped json marshaling 2023-10-16 11:15:55 +02:00
Michael Yang 11d82d7b9b update checkvram 2023-10-13 14:47:29 -07:00
Michael Yang 36fe2deebf only check system memory on macos 2023-10-13 14:47:29 -07:00
Michael Yang 4a8931f634 check total (system + video) memory 2023-10-13 14:47:29 -07:00
Michael Yang bd6e38fb1a refactor memory check 2023-10-13 14:47:29 -07:00
Michael Yang 92189a5855 fix memory check 2023-10-13 14:47:29 -07:00
Michael Yang d790bf9916
Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading
fix: offloading on low end GPUs
2023-10-13 14:36:44 -07:00
Michael Yang 35afac099a do not use gpu binary when num_gpu == 0 2023-10-13 14:32:12 -07:00
Michael Yang 811c3d1900 no gpu if vram < 2GB 2023-10-13 14:32:12 -07:00
Bruce MacDonald 6fe178134d
improve api error handling (#781)
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Bruce MacDonald 56497663c8
relay model runner error message to client (#720)
* give direction to user when runner fails
* also relay errors from timeout
* increase timeout to 3 minutes
2023-10-12 11:16:37 -04:00
Michael Yang b599946b74 add format bytes 2023-10-11 14:08:23 -07:00
Bruce MacDonald 77295f716e
prevent waiting on exited command (#752)
* prevent waiting on exited command
* close llama runner once
2023-10-11 12:32:13 -04:00
Bruce MacDonald f2ba1311aa
improve vram safety with 5% vram memory buffer (#724)
* check free memory not total
* wait for subprocess to exit
2023-10-10 16:16:09 -04:00
Jeffrey Morgan ab0668293c llm: fix build on amd64 2023-10-06 14:39:54 -07:00
Bruce MacDonald 5d22319a2c
rename server subprocess (#700)
- this makes it easier to see that the subprocess is associated with ollama
2023-10-06 10:15:42 -04:00
Bruce MacDonald d06bc0cb6e
enable q8, q5, 5_1, and f32 for linux gpu (#699) 2023-10-05 12:53:47 -04:00
Bruce MacDonald 9e2de1bd2c
increase streaming buffer size (#692) 2023-10-04 14:09:00 -04:00
Michael Yang c02c0cd483 starcoder 2023-10-02 19:56:51 -07:00
Bruce MacDonald b1f7123301
clean up num_gpu calculation code (#673) 2023-10-02 14:53:42 -04:00
Bruce MacDonald 1fbf3585d6
Relay default values to llama runner (#672)
* include seed in params for llama.cpp server and remove empty filter for temp

* relay default predict options to llama.cpp

- reorganize options to match predict request for readability

* omit empty stop

---------

Co-authored-by: hallh <hallh@users.noreply.github.com>
2023-10-02 14:53:16 -04:00
Bruce MacDonald 9771b1ec51
windows runner fixes (#637) 2023-09-29 11:47:55 -04:00
Michael Yang f40b3de758 use int64 consistently 2023-09-28 11:07:24 -07:00
Bruce MacDonald 86279f4ae3
unbound max num gpu layers (#591)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-25 18:36:46 -04:00
Michael Yang 058d0cd04b silence warm up log 2023-09-21 14:53:33 -07:00
Michael Yang ee1c994d15
update submodule (#567) 2023-09-21 16:22:23 -04:00
Bruce MacDonald 4cba75efc5
remove tmp directories created by previous servers (#559)
* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Michael Yang a9ed7cc6aa rename generate.go 2023-09-20 14:42:17 -07:00
Michael Yang 6c6a31a1e8 embed libraries using cmake 2023-09-20 14:41:57 -07:00
Bruce MacDonald fc6ec356fc remove libcuda.so 2023-09-20 20:36:14 +01:00
Bruce MacDonald 1255bc9b45 only package 11.8 runner 2023-09-20 20:00:41 +01:00