Commit graph

118 commits

Author SHA1 Message Date
Michael Yang a00fac4ec8 update llama.cpp 2023-11-21 09:50:02 -08:00
Jeffrey Morgan a3fcecf943 only set main_gpu if value > 0 is provided 2023-11-20 19:54:04 -05:00
Michael Yang 19b7a4d715 recent llama.cpp update added kernels for fp32, q5_0, and q5_1 2023-11-20 13:44:31 -08:00
Purinda Gunasekara be61a81758
main-gpu argument is not getting passed to llamacpp, fixed. (#1192) 2023-11-20 10:52:52 -05:00
Jeffrey Morgan 13ba6df5ab enable cpu instructions on intel macs 2023-11-19 23:20:26 -05:00
Jeffrey Morgan 36a3bbf65f Update llm/llama.go 2023-11-18 21:25:07 -05:00
Bruce MacDonald 43a726149d fix potentially inaccurate error message 2023-11-18 21:25:07 -05:00
Jeffrey Morgan 41434a7cdc build intel mac with correct binary and compile flags 2023-11-16 22:14:51 -05:00
Jeffrey Morgan 5cba29b9d6
JSON mode: add `"format" as an api parameter (#1051)
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Bruce MacDonald 1ae84bc2a2
skip gpu if less than 2GB VRAM are available (#1059) 2023-11-09 13:16:16 -08:00
Michael Yang c5e1bbabda
instead of static number of parameters for each model family, get the real number from the tensors (#1022)
* parse tensor info

* refactor decoder

* return actual parameter count

* explicit rounding

* s/Human/HumanNumber/
2023-11-08 17:55:46 -08:00
Jeffrey Morgan c44b619428 remove unused fmt.Println 2023-11-03 17:24:58 -07:00
Jeffrey Morgan 17678b7225 Restore system prompt on requests and default num_keep to 0 2023-11-03 13:25:25 -07:00
Jeffrey Morgan 2e53704685
default rope params to 0 for new models (#968) 2023-11-02 08:41:30 -07:00
Michael Yang 642128b75a append LD_LIBRARY_PATH 2023-10-31 15:54:49 -07:00
Jeffrey Morgan 3a1ed9ff70
restore building runner with AVX on by default (#900) 2023-10-27 12:13:44 -07:00
Bruce MacDonald 6d283882b1
catch insufficient permissions nvidia err (#934) 2023-10-27 12:42:40 -04:00
Bruce MacDonald 2665f3c28e
offload 75% of available vram to improve stability (#921) 2023-10-26 20:49:55 -04:00
Jeffrey Morgan b0c9cd0f3b fix metal assertion errors 2023-10-24 00:32:36 -07:00
Jeffrey Morgan 77f61c6301 update submodule commit 2023-10-24 00:30:27 -07:00
Jeffrey Morgan f3604534e5 update submodule commit 2023-10-23 23:59:12 -07:00
Michael Yang 0c7a00a264 bump submodules
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since
438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
2023-10-23 11:17:59 -07:00
Michael Yang 36c160f1c3
Merge pull request #881 from jmorganca/mxyng/ggufv3
ggufv3
2023-10-23 10:50:45 -07:00
Michael Yang c9167494cb update default log target 2023-10-23 10:44:50 -07:00
Michael Yang 125d0a013a ggufv3
ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.

loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.
2023-10-23 09:35:49 -07:00
Jeffrey Morgan 7ed5a39bc7 simpler check for model loading compatibility errors 2023-10-19 14:50:49 -04:00
Jeffrey Morgan a7dad24d92
add error for falcon and starcoder vocab compatibility (#844)
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-19 12:18:31 -04:00
Michael Yang 235e43d7f6
Merge pull request #833 from discovertomorrow/leadingspace
Fix Issue with Leading Whitespaces in Decoded Context
2023-10-18 13:52:48 -07:00
Arne Müller 730996e530 use TrimPrefix instead of TrimLeft 2023-10-18 22:51:30 +02:00
Arne Müller ce6197a8e0 removed redundant strings.CutPrefix from Decode 2023-10-18 22:47:20 +02:00
Arne Müller 46b9953f32 use strings.TrimLeft to remove spaces 2023-10-18 22:41:19 +02:00
Bruce MacDonald 565648f3f7
relay CUDA errors to the client (#825) 2023-10-18 15:36:56 -04:00
Arne Müller 90c49bed57 moved removal of leading space into Predict 2023-10-18 20:08:26 +02:00
Arne Müller 5dc0cff459 fix whitespace removal 2023-10-18 08:15:27 +02:00
Michael Yang 08b0e04f40
Merge pull request #813 from jmorganca/mxyng/llama
refactor llm/llama.go
2023-10-17 14:05:58 -07:00
Michael Yang b36b0b71f8 use cut prefix 2023-10-17 14:01:39 -07:00
Michael Yang 094df37563 remove unused struct 2023-10-17 14:01:38 -07:00
Bruce MacDonald f3648fd206
Update llama.cpp gguf to latest (#710) 2023-10-17 16:55:16 -04:00
Bruce MacDonald bd93a94abd
fix MB VRAM log output (#824) 2023-10-17 15:35:16 -04:00
Michael Yang f55bdb6f10
Merge pull request #799 from deichbewohner/jsonmarshaling
Fix JSON Marshal Escaping for Special Characters
2023-10-17 08:46:02 -07:00
Michael Yang 2870a9bfc8
Merge pull request #812 from jmorganca/mxyng/fix-format-string
fix: wrong format string type
2023-10-17 08:40:49 -07:00
Arne Müller 8fa3f366ad Removed newline trimming and used buffer directly in POST request. 2023-10-17 08:17:35 +02:00
Michael Yang fddb303f23 fix: format string wrong type 2023-10-16 16:14:28 -07:00
Michael Yang cb4a80b693 fix: regression unsupported metal types
omitting `--n-gpu-layers` means use metal on macos which isn't correct
since ollama uses `num_gpu=0` to explicitly disable gpu for file types
that are not implemented in metal
2023-10-16 14:37:20 -07:00
Arne Müller ee94693b1a handling unescaped json marshaling 2023-10-16 11:15:55 +02:00
Michael Yang 11d82d7b9b update checkvram 2023-10-13 14:47:29 -07:00
Michael Yang 36fe2deebf only check system memory on macos 2023-10-13 14:47:29 -07:00
Michael Yang 4a8931f634 check total (system + video) memory 2023-10-13 14:47:29 -07:00
Michael Yang bd6e38fb1a refactor memory check 2023-10-13 14:47:29 -07:00
Michael Yang 92189a5855 fix memory check 2023-10-13 14:47:29 -07:00