Commit graph

15 commits

Author SHA1 Message Date
Daniel Hiltgen d4cd695759 Add cgo implementation for llama.cpp
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Bruce MacDonald 811b1f03c8 deprecate ggml
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-19 09:05:46 -08:00
Jongwook Choi 12e8c12d2b
Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261)
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.
2023-11-24 14:05:57 -05:00
Jeffrey Morgan 3a1ed9ff70
restore building runner with AVX on by default (#900) 2023-10-27 12:13:44 -07:00
Michael Yang c9167494cb update default log target 2023-10-23 10:44:50 -07:00
Bruce MacDonald 5d22319a2c
rename server subprocess (#700)
- this makes it easier to see that the subprocess is associated with ollama
2023-10-06 10:15:42 -04:00
Michael Yang 058d0cd04b silence warm up log 2023-09-21 14:53:33 -07:00
Michael Yang 6c6a31a1e8 embed libraries using cmake 2023-09-20 14:41:57 -07:00
Bruce MacDonald fc6ec356fc remove libcuda.so 2023-09-20 20:36:14 +01:00
Bruce MacDonald 1255bc9b45 only package 11.8 runner 2023-09-20 20:00:41 +01:00
Bruce MacDonald b9bb5ca288 use cuda_version 2023-09-20 17:58:16 +01:00
Bruce MacDonald 4e8be787c7 pack in cuda libs 2023-09-20 17:40:42 +01:00
Bruce MacDonald 2540c9181c
support for packaging in multiple cuda runners (#509)
* enable packaging multiple cuda versions
* use nvcc cuda version if available

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-14 15:08:13 -04:00
Bruce MacDonald f59c4d03f7
fix ggml arm64 cuda build (#520) 2023-09-12 17:06:48 -04:00
Bruce MacDonald f221637053
first pass at linux gpu support (#454)
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-12 11:04:35 -04:00