Commit graph

268 commits

Author SHA1 Message Date
Patrick Devine abed273de3
add "stop" command (#6739) 2024-09-11 16:36:21 -07:00
Michael Yang ecab6f1cc5 refactor show ouput
fixes line wrapping on long texts
2024-09-11 14:23:09 -07:00
Daniel Hiltgen 6719097649
llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
2024-09-05 14:00:08 -07:00
Daniel Hiltgen b05c9e83d9
Introduce GPU Overhead env var (#5922)
Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs
2024-09-05 13:46:35 -07:00
Vimal Kumar 5f7b4a5e30
fix(cmd): show info may have nil ModelInfo (#6579) 2024-08-31 21:12:17 -07:00
Patrick Devine 0c819e167b
convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
Michael Yang beb49eef65 create bert models from cli 2024-08-20 17:27:34 -07:00
longtao 0a8d6ea86d
Fix typo and improve readability (#5964)
* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-13 17:54:19 -07:00
Josh f7e3b9190f
cmd: spinner progress for transfer model data (#6100) 2024-08-12 11:46:32 -07:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang 5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
Daniel Hiltgen 1a83581a8e
Merge pull request #5895 from dhiltgen/sched_faq
Better explain multi-gpu behavior
2024-07-29 14:25:41 -07:00
Michael Yang 38d9036b59
Merge pull request #5992 from ollama/mxyng/save
fix: model save
2024-07-29 09:53:19 -07:00
Tibor Schmidt f3d7a481b7
feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
Michael Yang a250c2cb13 display messages 2024-07-26 13:39:57 -07:00
Michael Yang 3d9de805b7 fix: model save
stop parameter is saved as a slice which is incompatible with modelfile
parsing
2024-07-26 13:23:06 -07:00
Michael Yang 15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Daniel Hiltgen 830fdd2715 Better explain multi-gpu behavior 2024-07-23 15:16:38 -07:00
Michael Yang 55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang 4f1afd575d host 2024-07-22 11:25:30 -07:00
Daniel Hiltgen cc269ba094 Remove no longer supported max vram var
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios.  With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Patrick Devine 057d31861e
remove template (#5655) 2024-07-13 20:56:24 -07:00
Patrick Devine 23ebbaa46e Revert "remove template from tests"
This reverts commit 9ac0a7a50b.
2024-07-12 15:47:17 -07:00
Patrick Devine 9ac0a7a50b remove template from tests 2024-07-12 15:41:31 -07:00
royjhan 5f034f5b63
Include Show Info in Interactive (#5342) 2024-06-28 13:15:52 -07:00
royjhan b910fa9010
Ollama Show: Check for Projector Type (#5307)
* Check exists projtype

* Maintain Ordering
2024-06-28 11:30:16 -07:00
Michael Yang 123a722a6f
zip: prevent extracting files into parent dirs (#5314) 2024-06-26 21:38:21 -07:00
Blake Mizerany 2aa91a937b
cmd: defer stating model info until necessary (#5248)
This commit changes the 'ollama run' command to defer fetching model
information until it really needs it. That is, when in interactive mode.

It also removes one such case where the model information is fetch in
duplicate, just before calling generateInteractive and then again, first
thing, in generateInteractive.

This positively impacts the performance of the command:

    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
2024-06-24 20:14:03 -07:00
royjhan fedf71635e
Extend api/show and ollama show to return more model info (#4881)
* API Show Extended

* Initial Draft of Information

Co-Authored-By: Patrick Devine <pdevine@sonic.net>

* Clean Up

* Descriptive arg error messages and other fixes

* Second Draft of Show with Projectors Included

* Remove Chat Template

* Touches

* Prevent wrapping from files

* Verbose functionality

* Docs

* Address Feedback

* Lint

* Resolve Conflicts

* Function Name

* Tests for api/show model info

* Show Test File

* Add Projector Test

* Clean routes

* Projector Check

* Move Show Test

* Touches

* Doc update

---------

Co-authored-by: Patrick Devine <pdevine@sonic.net>
2024-06-19 14:19:02 -07:00
Patrick Devine c69bc19e46
move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00
Michael Yang 201d853fdf nolintlint 2024-06-04 11:13:30 -07:00
Michael Yang e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang 8ffb51749f nolintlint 2024-06-04 11:13:30 -07:00
Michael Yang 04f3c12bb7 replace x/exp/slices with slices 2024-06-04 11:13:30 -07:00
Josh Yan 914f68f021 replaced duplicate call with variable 2024-05-30 10:38:07 -07:00
Josh Yan bd1d119ba9 fixed japanese characters deleted at end of line 2024-05-30 10:24:21 -07:00
Lei Jitang a03be18189
Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message (#4663)
* envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY

Signed-off-by: Lei Jitang <leijitang@outlook.com>

* serve: Add more env to help message of ollama serve

Add more enviroment variables to `ollama serve --help`
to let users know what can be configurated.

Signed-off-by: Lei Jitang <leijitang@outlook.com>

---------

Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-05-30 09:36:51 -07:00
Patrick Devine 4cc3be3035
Move envconfig and consolidate env vars (#4608) 2024-05-24 14:57:15 -07:00
Josh 9f18b88a06
Merge pull request #4566 from ollama/jyan/shortcuts
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan 353f83a9c7 add Ctrl + W shortcut 2024-05-21 16:55:09 -07:00
Patrick Devine d355d2020f add fixes for llama 2024-05-20 16:13:57 -07:00
Patrick Devine ccdf0b2a44
Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
Patrick Devine 105186aa17
add OLLAMA_NOHISTORY to turn off history in interactive mode (#4508) 2024-05-18 11:51:57 -07:00
Josh Yan 3d90156e99 removed comment 2024-05-16 14:12:03 -07:00
Josh Yan 26bfc1c443 go fmt'd cmd.go 2024-05-15 17:26:39 -07:00
Josh Yan 799aa9883c go fmt'd cmd.go 2024-05-15 17:24:17 -07:00
Josh Yan c9e584fb90 updated double-width display 2024-05-15 16:45:24 -07:00
Josh Yan 17b1e81ca1 fixed width and word count for double spacing 2024-05-15 16:29:33 -07:00
Patrick Devine c344da4c5a
fix keepalive for non-interactive mode (#4438) 2024-05-14 15:17:04 -07:00