Commit graph

3370 commits

Author SHA1 Message Date
Michael Yang ec4c35fe99
Merge pull request #5512 from ollama/mxyng/detect-stop
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Michael Yang a250c2cb13 display messages 2024-07-26 13:39:57 -07:00
Michael Yang 3d9de805b7 fix: model save
stop parameter is saved as a slice which is incompatible with modelfile
parsing
2024-07-26 13:23:06 -07:00
Michael Yang 15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Jeffrey Morgan f5e3939220
Update api.md (#5968) 2024-07-25 23:10:18 -04:00
Jeffrey Morgan ae27d9dcfd
Update openai.md 2024-07-25 20:27:33 -04:00
Michael Yang 37096790a7
Merge pull request #5552 from ollama/mxyng/messages-docs
docs
2024-07-25 16:26:19 -07:00
Michael Yang 997c903884
Update docs/template.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-25 16:23:40 -07:00
Blake Mizerany c8af3c2d96
server: reuse original download URL for images (#5962)
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Jeffrey Morgan 455e61170d
Update openai.md 2024-07-25 18:34:47 -04:00
royjhan 4de1370a9d
openai tools doc (#5617) 2024-07-25 18:34:06 -04:00
Jeffrey Morgan bbf8f102ee
Revert "llm(llama): pass rope factors (#5924)" (#5963)
This reverts commit bb46bbcf5e.
2024-07-25 18:24:55 -04:00
Daniel Hiltgen ce3c93b08f Report better error on cuda unsupported os/arch
If we detect an NVIDIA GPU, but nvidia doesn't support the os/arch,
this will report a better error for the user and point them to docs
to self-install the drivers if possible.
2024-07-24 17:09:20 -07:00
Daniel Hiltgen 6c2129d5d0 Explain font problems on windows 10 2024-07-24 15:22:00 -07:00
Daniel Hiltgen 7c2a157ca4 Ensure amd gpu nodes are numerically sorted
For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.
2024-07-24 13:43:26 -07:00
Michael Yang bb46bbcf5e
llm(llama): pass rope factors (#5924) 2024-07-24 16:05:59 -04:00
royjhan ac33aa7d37
Fix Embed Test Flakes (#5893)
* float cmp

* increase tolerance
2024-07-24 11:15:46 -07:00
Daniel Hiltgen 830fdd2715 Better explain multi-gpu behavior 2024-07-23 15:16:38 -07:00
Ajay Chintala a6cd8f6169
Update README.md to add LLMStack integration (#5799) 2024-07-23 14:40:23 -04:00
Daniel Hiltgen c78089263a
Merge pull request #5864 from dhiltgen/bump_go
Bump Go patch version
2024-07-22 16:34:18 -07:00
Daniel Hiltgen 3e5ea035d5
Merge pull request #5757 from lreed-mdsol/lreed/bump-go-version-fix-vulnerabilities
bump go version to 1.22.5 to fix security vulnerabilities in docker
2024-07-22 16:32:43 -07:00
Daniel Hiltgen 5d604eec5b Bump Go patch version 2024-07-22 16:16:28 -07:00
Josh db0968f30c
fix dupe err message (#5857) 2024-07-22 15:48:15 -07:00
Daniel Hiltgen e12fff8810 Enable windows error dialog for subprocess startup
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it.  Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang 9b60a038e5 update api.md 2024-07-22 13:49:51 -07:00
Michael Yang 83a0cb8d88 docs 2024-07-22 13:38:09 -07:00
royjhan c0648233f2
api embed docs (#5282) 2024-07-22 13:37:08 -07:00
Jeffrey Morgan d835368eb8
convert: capture head_dim for mistral (#5818) 2024-07-22 16:16:22 -04:00
Michael Yang 85d9d73a72 comments 2024-07-22 11:49:03 -07:00
Michael Yang 78140a712c cleanup tests 2024-07-22 11:49:03 -07:00
Michael Yang 1954ec5917 uint64 2024-07-22 11:49:02 -07:00
Michael Yang 0f1910129f int 2024-07-22 11:30:07 -07:00
Michael Yang e2c3f6b3e2 string 2024-07-22 11:27:52 -07:00
Michael Yang 8570c1c0ef keepalive 2024-07-22 11:27:22 -07:00
Michael Yang 55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang 66fe77f084 models 2024-07-22 11:26:12 -07:00
Michael Yang d1a5227cad origins 2024-07-22 11:25:30 -07:00
Michael Yang 4f1afd575d host 2024-07-22 11:25:30 -07:00
Michael Yang 35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
Daniel Hiltgen 5784c05397
Merge pull request #5854 from dhiltgen/win_exit_status
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Daniel Hiltgen f14aa5435d
Merge pull request #5855 from dhiltgen/remove_max_vram
Remove no longer supported max vram var
2024-07-22 10:35:29 -07:00
Jeffrey Morgan f8fedbda20
Update llama.cpp submodule commit to d94c6e0c (#5805) 2024-07-22 12:42:00 -04:00
Jeffrey Morgan b3e5491e41
server: collect nested tool call objects when parsing (#5824) 2024-07-22 12:38:03 -04:00
Daniel Hiltgen cc269ba094 Remove no longer supported max vram var
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios.  With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Daniel Hiltgen a3c20e3f18 Refine error reporting for subprocess crash
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
Jeffrey Morgan 80ee9b5e47
Remove out of space test temporarily (#5825) 2024-07-21 00:22:11 -04:00
Jeffrey Morgan 5534f2cc6a
llm: consider head_dim in llama arch (#5817) 2024-07-20 21:48:12 -04:00
Daniel Hiltgen d321297d8a
Merge pull request #5815 from dhiltgen/win_rocm_gfx_features
Adjust windows ROCm discovery
2024-07-20 16:02:55 -07:00
Daniel Hiltgen 06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Daniel Hiltgen 5d707e6fd5
Merge pull request #5583 from dhiltgen/integration_improvements
Fix context exhaustion integration test for small gpus
2024-07-20 15:48:21 -07:00