ollama

History

Jongwook Choi 12e8c12d2b Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 ) When CUDA peer access is enabled, multi-gpu inference will produce garbage output. This is a known bug of llama.cpp (or nvidia). Until the upstream bug is fixed, we can disable CUDA peer access temporarily to ensure correct output. See #961.		2023-11-24 14:05:57 -05:00
..
llama.cpp	Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 )	2023-11-24 14:05:57 -05:00
falcon.go	starcoder	2023-10-02 19:56:51 -07:00
ggml.go	ggufv3	2023-10-23 09:35:49 -07:00
gguf.go	fix: gguf int type	2023-11-22 11:40:30 -08:00
llama.go	only set `main_gpu` if value > 0 is provided	2023-11-20 19:54:04 -05:00
llm.go	recent llama.cpp update added kernels for fp32, q5_0, and q5_1	2023-11-20 13:44:31 -08:00
starcoder.go	starcoder	2023-10-02 19:56:51 -07:00
utils.go	partial decode ggml bin for more info	2023-08-10 09:23:10 -07:00