Commit graph

25 commits

Author SHA1 Message Date
Jeffrey Morgan a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
2024-01-25 13:46:20 -08:00
Daniel Hiltgen 0f5b843319 Refine Accelerate usage on mac
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Daniel Hiltgen df54c723ae Make CPU builds parallel and customizable AMD GPUs
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Jeffrey Morgan 4c54f0ddeb
sign dylibs on macOS (#2101) 2024-01-19 19:24:11 -05:00
Jeffrey Morgan dc88cc3981
use gzip for runner embedding (#2067) 2024-01-19 13:23:03 -05:00
Daniel Hiltgen fccdf4c635
Merge pull request #1987 from xyproto/archlinux
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen 1b249748ab Add multiple CPU variants for Intel Mac
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth cbe2adc78a
Merge branch 'main' into archlinux 2024-01-17 12:50:11 +01:00
Daniel Hiltgen 795674dd90 Bump llama.cpp to b1842 and add new cuda lib dep
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Daniel Hiltgen 8795447dad
Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
improve cuda detection (rel. issue #1704)
2024-01-14 18:00:11 -08:00
Daniel Hiltgen 3ca5f69ce8 Fix typo in arm mac arch script 2024-01-14 08:32:57 -08:00
Alexander F. Rødseth f4bf1d514f Let gpu.go and gen_linux.sh also find CUDA on Arch Linux 2024-01-14 13:40:36 +01:00
Daniel Hiltgen 2ecb247276 Fix intel mac build
Make sure we're building an x86 ext_server lib when cross-compiling
2024-01-13 14:46:34 -08:00
Jeffrey Morgan 288ef8ff95
add gcc -lstdc++ flag for linux cpu (#1974) 2024-01-13 03:53:00 -05:00
Jeffrey Morgan 4cf17990f7
use g++ to build libext_server.so on linux (#1972) 2024-01-13 03:12:42 -05:00
Fabian Preiss 905862e17b improve cuda detection (rel. issue #1704) 2024-01-12 21:59:19 +01:00
Daniel Hiltgen 39928a42e8 Always dynamically load the llm server library
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
2024-01-11 08:42:47 -08:00
Daniel Hiltgen d88c527be3 Build multiple CPU variants and pick the best
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker.  Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
2024-01-11 08:42:47 -08:00
Daniel Hiltgen 8da7bef05f Support multiple variants for a given llm lib type
In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.
2024-01-10 17:27:51 -08:00
Jeffrey Morgan 34344d801c clean up cmake build directory when cross compiling macOS builds 2024-01-09 17:13:56 -05:00
Jeffrey Morgan 8a8c7e7f8d only build for metal on arm64 2024-01-09 13:51:08 -05:00
Jeffrey Morgan dbdd50b283
add -DCMAKE_SYSTEM_NAME=Darwin cmake flag (#1832) 2024-01-07 00:46:17 -05:00
Bruce MacDonald 3367b5f3df
remove unused generate patches (#1810) 2024-01-05 11:25:45 -05:00
Daniel Hiltgen 9983fa5f4e Cleaup stale submodule
If the tree has a stale submodule, make sure we clean it up first
2024-01-04 13:40:16 -08:00
Daniel Hiltgen 77d96da94b Code shuffle to clean up the llm dir 2024-01-04 12:12:05 -08:00