Commit graph

23 commits

Author SHA1 Message Date
Daniel Hiltgen c7bcb00319 Wire up ccache and pigz in the docker based build
This should help speed things up a little
2024-08-19 09:38:53 -07:00
Daniel Hiltgen 74d45f0102 Refactor linux packaging
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
Jeffrey Morgan 2cc854f8cb
llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511)
* Revert "fix cmake build (#5505)"

This reverts commit 4fd5f3526a.

* llm: fix missing dylibs by restoring old build behavior

* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan 4fd5f3526a
fix cmake build (#5505) 2024-07-05 19:07:01 -04:00
Roy Yang 5f73c08729
Remove trailing spaces (#3889) 2024-04-25 14:32:26 -04:00
Daniel Hiltgen 58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Jeremy dfc6721b20 add support for libcudart.so for CUDA devices (adds Jetson support) 2024-03-25 11:07:44 -04:00
Daniel Hiltgen 85129d3a32 Adapt our build for imported server.cpp 2024-03-12 14:57:15 -07:00
John 23ebe8fe11
fix some typos (#2973)
Signed-off-by: hishope <csqiye@126.com>
2024-03-06 22:50:11 -08:00
Bernhard M. Wiedemann 76e5d9ec88 Omit build date from gzip headers
See https://reproducible-builds.org/ for why this is good.

This patch was done while working on reproducible builds for openSUSE.
2024-02-29 16:48:19 +01:00
Daniel Hiltgen e1f50377f4 Harden generate patching model
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Daniel Hiltgen e02ecfb6c8
Merge pull request #2116 from dhiltgen/cc_50_80
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Jeffrey Morgan a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
2024-01-25 13:46:20 -08:00
Daniel Hiltgen a447a083f2 Add compute capability 5.0, 7.5, and 8.0 2024-01-20 14:24:05 -08:00
Jeffrey Morgan 4c54f0ddeb
sign dylibs on macOS (#2101) 2024-01-19 19:24:11 -05:00
Jeffrey Morgan dc88cc3981
use gzip for runner embedding (#2067) 2024-01-19 13:23:03 -05:00
Daniel Hiltgen 1b249748ab Add multiple CPU variants for Intel Mac
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Jeffrey Morgan 288ef8ff95
add gcc -lstdc++ flag for linux cpu (#1974) 2024-01-13 03:53:00 -05:00
Jeffrey Morgan 4cf17990f7
use g++ to build libext_server.so on linux (#1972) 2024-01-13 03:12:42 -05:00
Daniel Hiltgen d88c527be3 Build multiple CPU variants and pick the best
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker.  Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
2024-01-11 08:42:47 -08:00
Bruce MacDonald 3367b5f3df
remove unused generate patches (#1810) 2024-01-05 11:25:45 -05:00
Daniel Hiltgen 9983fa5f4e Cleaup stale submodule
If the tree has a stale submodule, make sure we clean it up first
2024-01-04 13:40:16 -08:00
Daniel Hiltgen 77d96da94b Code shuffle to clean up the llm dir 2024-01-04 12:12:05 -08:00
Renamed from llm/llama.cpp/gen_common.sh (Browse further)