Commit graph

2332 commits

Author SHA1 Message Date
Michael Yang 61e6502449
Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt
fix(cmd): history in alt prompt
2024-01-08 13:48:34 -08:00
Jeffrey Morgan 08f1e18965
Offload layers to GPU based on new model size estimates (#1850)
* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-01-08 16:42:00 -05:00
Bruce MacDonald 7e8f7c8358
remove ggml automatic re-pull (#1856) 2024-01-08 14:41:01 -05:00
Bruce MacDonald 3f3eb19a3b
document response in modelfile template variables (#1428) 2024-01-08 14:38:51 -05:00
Daniel Hiltgen 059ae4585e
Merge pull request #1834 from dhiltgen/old_cuda
Detect very old CUDA GPUs and fall back to CPU
2024-01-07 10:39:49 -08:00
Daniel Hiltgen 6347f501ca
Merge pull request #1828 from dhiltgen/fix_llava
Accept windows paths for image processing
2024-01-07 09:05:46 -08:00
Jeffrey Morgan 5feec959ad
dont use -Wall in static build (#1833) 2024-01-07 10:39:19 -05:00
Jeffrey Morgan dbdd50b283
add -DCMAKE_SYSTEM_NAME=Darwin cmake flag (#1832) 2024-01-07 00:46:17 -05:00
Daniel Hiltgen d74ce6bd4f Detect very old CUDA GPUs and fall back to CPU
If we try to load the CUDA library on an old GPU, it panics and crashes
the server.  This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.
2024-01-06 21:40:29 -08:00
Guilherme Baptista 57942b4676
Update README.md - Community Integrations - Ollama for Ruby (#1830) 2024-01-06 22:31:39 -05:00
Daniel Hiltgen e0d05b0f1e Accept windows paths for image processing
This enhances our regex to support windows style paths.  The regex will
match invalid path specifications, but we'll still validate file
existence and filter out mismatches
2024-01-06 10:50:27 -08:00
Daniel Hiltgen 2d9dd14f27
Merge pull request #1697 from dhiltgen/win_docs
Add windows native build instructions
2024-01-05 19:34:20 -08:00
Jeffrey Morgan 1caa56128f add cuda lib path for nvidia container toolkit 2024-01-05 21:10:37 -05:00
Michael Yang 0101e76dbe
Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05
fix: allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 17:20:09 -08:00
Michael Yang 2ef9352b94 fix(cmd): history in alt mode 2024-01-05 16:20:02 -08:00
Michael Yang 5580ae2472 fix: set template without triple quotes 2024-01-05 15:51:33 -08:00
Bruce MacDonald 3a9f447141
only pull gguf model if already exists (#1817) 2024-01-05 18:50:00 -05:00
Patrick Devine 9c2941e61b
switch api for ShowRequest to use the name field (#1816) 2024-01-05 15:06:43 -08:00
Patrick Devine 238ac5e765
Add unit tests for Parser (#1815) 2024-01-05 14:04:31 -08:00
Bruce MacDonald 4f4980b66b
simplify ggml update logic (#1814)
- additional information is now available in show response, use this to pull gguf before running
- make gguf updates cancellable
2024-01-05 15:22:32 -05:00
Patrick Devine 22e93efa41 add show info command and fix the modelfile 2024-01-05 12:20:05 -08:00
Patrick Devine 2909dce894 split up interactive generation 2024-01-05 12:20:05 -08:00
Jeffrey Morgan df32537312
gpu: read memory info from all cuda devices (#1802)
* gpu: read memory info from all cuda devices

* add `LOOKUP_SIZE` constant

* better constant name

* address comments
2024-01-05 11:25:58 -05:00
Bruce MacDonald 3367b5f3df
remove unused generate patches (#1810) 2024-01-05 11:25:45 -05:00
Matt Williams 46edbbc518
Merge pull request #1801 from jmorganca/mattw/correctdockerlink 2024-01-04 19:20:45 -08:00
Michael Yang d2ff18cd6b
Merge pull request #1791 from jmorganca/mxyng/update-build
update Dockerfile.build
2024-01-04 19:13:44 -08:00
Matt Williams df086d3c8c fix docker doc to point to hub
Signed-off-by: Matt Williams <m@technovangelist.com>
2024-01-04 18:42:23 -08:00
Nicholas Dudfield 8baaaa39c0 Allow extension origins (still needs explicit listing), fixes #1686 2024-01-05 09:06:47 +07:00
Michael Yang f9961c70ae update build 2024-01-04 17:34:38 -08:00
Daniel Hiltgen cd8fad3398
Merge pull request #1790 from dhiltgen/llm_code_shuffle
Cleaup stale submodule
2024-01-04 13:47:25 -08:00
Daniel Hiltgen 9983fa5f4e Cleaup stale submodule
If the tree has a stale submodule, make sure we clean it up first
2024-01-04 13:40:16 -08:00
Daniel Hiltgen dfda91c2ee
Merge pull request #1788 from dhiltgen/llm_code_shuffle
Revamp code layout for the llm directory and llama.cpp submodule
2024-01-04 13:14:28 -08:00
Daniel Hiltgen fac9060da5 Init submodule with new path 2024-01-04 13:00:13 -08:00
Daniel Hiltgen a554616f8e remove old llama.cpp submodule path 2024-01-04 12:12:21 -08:00
Daniel Hiltgen 77d96da94b Code shuffle to clean up the llm dir 2024-01-04 12:12:05 -08:00
Brian Murray 0d6e3565ae
Add embeddings to API (#1773) 2024-01-04 15:00:52 -05:00
Daniel Hiltgen b5939008a1
Merge pull request #1785 from dhiltgen/win_native_cli
Load dynamic cpu lib on windows
2024-01-04 08:55:01 -08:00
Daniel Hiltgen e9ce91e9a6 Load dynamic cpu lib on windows
On linux, we link the CPU library in to the Go app and fall back to it
when no GPU match is found. On windows we do not link in the CPU library
so that we can better control our dependencies for the CLI.  This fixes
the logic so we correctly fallback to the dynamic CPU library
on windows.
2024-01-04 08:41:41 -08:00
Bruce MacDonald 4ad6c9b11f
fix: pull either original model or from model on create (#1774) 2024-01-04 01:34:38 -05:00
Jeffrey Morgan c0285158a9 tweak memory requirements error text 2024-01-03 19:47:18 -05:00
Jeffrey Morgan 77a66df72c add macOS memory check for 47B models 2024-01-03 19:46:16 -05:00
Jeffrey Morgan 5b4837f881 remove unused filetype check 2024-01-03 19:45:39 -05:00
Jeffrey Morgan 29340c2e62
update cmake flags for amd64 macOS (#1780)
* update cmake flags for intel macOS

* remove `LLAMA_K_QUANTS`

* put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`
2024-01-03 19:22:15 -05:00
Daniel Hiltgen d5ec730354
Merge pull request #1779 from dhiltgen/refined_amd_gpu_list
Improve maintainability of Radeon card list
2024-01-03 16:18:57 -08:00
Daniel Hiltgen 8bed487aba
Merge pull request #1778 from dhiltgen/wsl1
Fail fast on WSL1 while allowing on WSL2
2024-01-03 16:18:41 -08:00
Daniel Hiltgen c1a10a6e9b
Merge pull request #1781 from dhiltgen/cpu_only_build
Fix CPU only builds
2024-01-03 16:18:25 -08:00
Daniel Hiltgen ddbfa6fe31 Fix CPU only builds
Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.
2024-01-03 16:08:34 -08:00
Daniel Hiltgen 2fcd41ef81 Fail fast on WSL1 while allowing on WSL2
This prevents users from accidentally installing on WSL1 with instructions
guiding how to upgrade their WSL instance to version 2.  Once running WSL2
if you have an NVIDIA card, you can follow their instructions to set up
GPU passthrough and run models on the GPU.  This is not possible on WSL1.
2024-01-03 16:02:32 -08:00
Daniel Hiltgen 16f4603b67 Improve maintainability of Radeon card list
This moves the list of AMD GPUs to an easier to maintain list which
should make it easier to update over time.
2024-01-03 15:16:56 -08:00
Daniel Hiltgen 1184686649
Merge pull request #1776 from dhiltgen/render_group
Add ollama user to render group for Radeon support
2024-01-03 13:07:54 -08:00