Commit graph

260 commits

Author SHA1 Message Date
Daniel Hiltgen a1dfab43b9 Ensure the libraries are present
When we store our libraries in a temp dir, a reaper might clean
them when we are idle, so make sure to check for them before
we reload.
2024-02-07 17:27:49 -08:00
Daniel Hiltgen de76b95dd4 Bump llama.cpp to b2081 2024-02-06 12:06:43 -08:00
Daniel Hiltgen 27aa2d4a19
Merge pull request #1849 from mraiser/main
Accomodate split cuda lib dir
2024-02-05 16:01:16 -08:00
Daniel Hiltgen e1f50377f4 Harden generate patching model
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Jeffrey Morgan f11bf0740b use llm.ImageData 2024-01-31 19:13:48 -08:00
Michael Yang 8450bf66e6 trim images 2024-01-31 19:13:47 -08:00
Daniel Hiltgen 72b12c3be7 Bump llama.cpp to b1999
This requires an upstream change to support graceful termination,
carried as a patch.
2024-01-30 16:52:12 -08:00
Jeffrey Morgan 2e06ed01d5 remove unknown CPPFLAGS option 2024-01-28 17:51:23 -08:00
mraiser 4c4c730a0a
Merge branch 'ollama:main' into main 2024-01-27 21:56:11 -05:00
Daniel Hiltgen e02ecfb6c8
Merge pull request #2116 from dhiltgen/cc_50_80
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Jeffrey Morgan 3ebd6a83fc update submodule to cd4fddb29f81d6a1f6d51a0c016bc6b486d68def 2024-01-25 13:54:11 -08:00
Jeffrey Morgan a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
2024-01-25 13:46:20 -08:00
mraiser a4564232a4
Update gen_linux.sh to find libcudart in separate directory 2024-01-25 09:49:35 -05:00
Michael Yang cd22855ef8 refactor tensor read 2024-01-24 10:48:31 -08:00
Jeffrey Morgan 4458efb73a
Load all layers on arm64 macOS if model is small enough (#2149) 2024-01-22 17:40:06 -08:00
Daniel Hiltgen 0f5b843319 Refine Accelerate usage on mac
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Jeffrey Morgan ffaf52e1e9 update submodule to 011e8ec577fd135cbc02993d3ea9840c516d6a1c 2024-01-22 15:16:54 -08:00
Daniel Hiltgen 3bc28736cd
Merge pull request #2143 from dhiltgen/llm_verbosity
Refine debug logging for llm
2024-01-22 13:19:16 -08:00
Daniel Hiltgen 730dcfcc7a Refine debug logging for llm
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
2024-01-22 12:26:49 -08:00
Daniel Hiltgen 27a2d5af54 Debug logging on init failure 2024-01-22 12:08:22 -08:00
Jeffrey Morgan 5f81a33f43
update submodule to 6f9939d (#2115) 2024-01-22 11:56:40 -08:00
Daniel Hiltgen 5576bb2348
Merge pull request #2130 from dhiltgen/more_faster
Make CPU builds parallel and customizable AMD GPUs
2024-01-21 16:14:12 -08:00
Daniel Hiltgen ec3764538d Probe GPUs before backend init
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
2024-01-21 15:59:38 -08:00
Daniel Hiltgen df54c723ae Make CPU builds parallel and customizable AMD GPUs
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Jeffrey Morgan 89c4aee29e
Unlock mutex when failing to load model (#2117) 2024-01-20 20:54:46 -05:00
Daniel Hiltgen a447a083f2 Add compute capability 5.0, 7.5, and 8.0 2024-01-20 14:24:05 -08:00
Daniel Hiltgen 681a914990 Add support for CUDA 5.2 cards 2024-01-20 10:48:43 -08:00
Jeffrey Morgan 4c54f0ddeb
sign dylibs on macOS (#2101) 2024-01-19 19:24:11 -05:00
Daniel Hiltgen 6a042438af Switch to local dlopen symbols 2024-01-19 11:37:02 -08:00
Jeffrey Morgan dc88cc3981
use gzip for runner embedding (#2067) 2024-01-19 13:23:03 -05:00
Daniel Hiltgen abec7f06e5
Merge pull request #2056 from dhiltgen/slog
Mechanical switch from log to slog
2024-01-18 14:27:24 -08:00
Daniel Hiltgen fedd705aea Mechanical switch from log to slog
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Daniel Hiltgen fccdf4c635
Merge pull request #1987 from xyproto/archlinux
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen 1b249748ab Add multiple CPU variants for Intel Mac
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth cbe2adc78a
Merge branch 'main' into archlinux 2024-01-17 12:50:11 +01:00
Daniel Hiltgen 795674dd90 Bump llama.cpp to b1842 and add new cuda lib dep
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Bruce MacDonald a897e833b8
do not cache prompt (#2018)
- prompt cache causes inferance to hang after some time
2024-01-16 13:48:05 -05:00
Daniel Hiltgen 8795447dad
Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
improve cuda detection (rel. issue #1704)
2024-01-14 18:00:11 -08:00
Daniel Hiltgen 95ad9a9fc8
Merge pull request #1988 from dhiltgen/fix_intel_mac
Fix typo in arm mac arch script
2024-01-14 08:45:18 -08:00
Daniel Hiltgen 3ca5f69ce8 Fix typo in arm mac arch script 2024-01-14 08:32:57 -08:00
Daniel Hiltgen cfa6337960
Merge pull request #1982 from dhiltgen/fix_intel_mac
Fix intel mac build
2024-01-14 08:26:46 -08:00
Alexander F. Rødseth f4bf1d514f Let gpu.go and gen_linux.sh also find CUDA on Arch Linux 2024-01-14 13:40:36 +01:00
Jeffrey Morgan 557110d0ba
Disable mmap with lora layers (#1985) 2024-01-13 23:36:31 -05:00
Daniel Hiltgen 2ecb247276 Fix intel mac build
Make sure we're building an x86 ext_server lib when cross-compiling
2024-01-13 14:46:34 -08:00
Jeffrey Morgan 288ef8ff95
add gcc -lstdc++ flag for linux cpu (#1974) 2024-01-13 03:53:00 -05:00
Jeffrey Morgan 4cf17990f7
use g++ to build libext_server.so on linux (#1972) 2024-01-13 03:12:42 -05:00
Michael Yang eaed6f8c45 add max context length check 2024-01-12 14:54:07 -08:00
Fabian Preiss 905862e17b improve cuda detection (rel. issue #1704) 2024-01-12 21:59:19 +01:00
Daniel Hiltgen 3773fb6465
Merge pull request #1935 from dhiltgen/cpu_fallback
Fix up the CPU fallback selection
2024-01-11 15:52:32 -08:00
Daniel Hiltgen 7427fa1387 Fix up the CPU fallback selection
The memory changes and multi-variant change had some merge
glitches I missed.  This fixes them so we actually get the cpu llm lib
and best variant for the given system.
2024-01-11 15:27:06 -08:00