Commit graph

3464 commits

Author SHA1 Message Date
Daniel Hiltgen 652c273f0e
Merge pull request #5049 from dhiltgen/cuda_v12
Cuda v12
2024-08-19 11:14:24 -07:00
Daniel Hiltgen 88e7705079
Merge pull request #6402 from rick-github/numParallel
Override numParallel in pickBestPartialFitByLibrary() only if unset.
2024-08-19 11:07:22 -07:00
Daniel Hiltgen f9e31da946 Review comments 2024-08-19 10:36:15 -07:00
Daniel Hiltgen 88bb9e3328 Adjust layout to bin+lib/ollama 2024-08-19 09:38:53 -07:00
Daniel Hiltgen 3b19cdba2a Remove Jetpack 2024-08-19 09:38:53 -07:00
Daniel Hiltgen 927d98a6cd Add windows cuda v12 + v11 support 2024-08-19 09:38:53 -07:00
Daniel Hiltgen f6c811b320 Enable cuda v12 flags 2024-08-19 09:38:53 -07:00
Daniel Hiltgen 4fe3a556fa Add cuda v12 variant and selection logic
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
2024-08-19 09:38:53 -07:00
Daniel Hiltgen fc3b4cda89 Report GPU variant in log 2024-08-19 09:38:53 -07:00
Daniel Hiltgen d470ebe78b Add Jetson cuda variants for arm
This adds new variants for arm64 specific to Jetson platforms
2024-08-19 09:38:53 -07:00
Daniel Hiltgen c7bcb00319 Wire up ccache and pigz in the docker based build
This should help speed things up a little
2024-08-19 09:38:53 -07:00
Daniel Hiltgen 74d45f0102 Refactor linux packaging
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
Jeffrey Morgan 9fddef3731
server: limit upload parts to 16 (#6411) 2024-08-19 09:20:52 -07:00
Richard Lyons 885cf45087 Fix white space. 2024-08-18 03:07:16 +02:00
Richard Lyons 9352eeb752 Reset NumCtx. 2024-08-18 02:55:01 +02:00
Richard Lyons 0ad0e738cd Override numParallel only if unset. 2024-08-18 01:43:26 +02:00
zwwhdls bdc4308afb fix: chmod new layer to 0o644 when creating it
Signed-off-by: zwwhdls <zww@hdls.me>
2024-08-16 11:43:19 +08:00
Daniel Hiltgen d29cd4c2ed
Merge pull request #6381 from eust-w/main
fix: Add tooltip to system tray icon
2024-08-15 15:31:15 -07:00
eust-w a84c05cf91 fix: Add tooltip to system tray icon
- Updated setIcon method to include tooltip text for the system tray icon.
- Added NIF_TIP flag and set the tooltip text using UTF16 encoding.

Resolves: #6372
2024-08-16 06:00:12 +08:00
Michael Yang e3d7f32af7
Merge pull request #6363 from ollama/mxyng/fix-noprune
fix: noprune on pull
2024-08-15 12:20:38 -07:00
Michael Yang 3a75e74e34 only skip invalid json manifests 2024-08-15 10:29:14 -07:00
Michael Yang 237dccba1e skip invalid manifest files 2024-08-14 16:55:45 -07:00
Michael Yang b3f75fc812 fix noprune 2024-08-14 15:48:51 -07:00
Jeffrey Morgan 8200c371ae
add CONTRIBUTING.md (#6349) 2024-08-14 15:19:50 -07:00
longtao 0a8d6ea86d
Fix typo and improve readability (#5964)
* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-13 17:54:19 -07:00
Blake Mizerany 8e1050f366
server: reduce max connections used in download (#6347)
The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
2024-08-13 16:47:35 -07:00
Bruce MacDonald eda8a32a09
update chatml template format to latest in docs (#6344) 2024-08-13 16:39:18 -07:00
Michael Yang a0a40aa20c
Merge pull request #6346 from ollama/mxyng/lint 2024-08-13 14:58:35 -07:00
Michael Yang 2697d7f5aa lint
- fixes printf: non-constant format string in call to fmt.Printf
- fixes SA1032: arguments have the wrong order
- disables testifylint
2024-08-13 14:36:33 -07:00
Pamela Fox 1f32276178
Update openai.md to remove extra checkbox (#6345) 2024-08-13 13:36:05 -07:00
Daniel Hiltgen 4c4fe3f87f
Merge pull request #6343 from dhiltgen/revert_win_go_version
Go back to a pinned Go version
2024-08-13 11:53:49 -07:00
Daniel Hiltgen feedf49c71 Go back to a pinned Go version
Go version 1.22.6 is triggering AV false positives, so go back to 1.22.5
2024-08-13 11:45:44 -07:00
royjhan 8b00a415ab
Load Embedding Model on Empty Input (#6325)
* load on empty input

* no load on invalid input
2024-08-13 10:19:56 -07:00
Michael Yang 01b80e9ffc
Merge pull request #5443 from ollama/mxyng/convert-phi3
add conversion for microsoft phi 3 mini/medium 4k, 128k
2024-08-12 15:47:58 -07:00
Michael Yang bd5e432630 update import.md 2024-08-12 15:13:29 -07:00
Bruce MacDonald aec77d6a05 support new "longrope" attention factor 2024-08-12 15:13:29 -07:00
Michael Yang 6ffb5cb017 add conversion for microsoft phi 3 mini/medium 4k, 128 2024-08-12 15:13:29 -07:00
Josh f7e3b9190f
cmd: spinner progress for transfer model data (#6100) 2024-08-12 11:46:32 -07:00
Josh 980dd15f81
cmd: speed up gguf creates (#6324) 2024-08-12 11:46:09 -07:00
royjhan 01d544d373
OpenAI: Simplify input output in testing (#5858)
* simplify input output

* direct comp

* in line image

* rm error pointer type

* update response testing

* lint
2024-08-12 10:33:34 -07:00
Josh 1dc3ef3aa9
Revert "server: speed up single gguf creates (#5898)" (#6323)
This reverts commit 8aac22438e.
2024-08-12 09:57:51 -07:00
Josh 8aac22438e
server: speed up single gguf creates (#5898) 2024-08-12 09:28:55 -07:00
Jeffrey Morgan 15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Daniel Hiltgen 25906d72d1
llm: prevent loading too large models on windows (#5926)
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
2024-08-11 11:30:20 -07:00
CognitiveTech 023451ce47
add integration obook-summary (#6305) 2024-08-10 18:43:08 -07:00
Jesse Gross 9b53e39d8e
Merge pull request #6258 from coolljt0725/fix_typo
server/download.go: Fix a typo in log
2024-08-09 17:19:48 -07:00
Michael Yang 97fae2df95
Merge pull request #6235 from Nicholas42/fix_line_endings
Set *.png and *.ico to be treated as binary files.
2024-08-09 17:06:30 -07:00
Michael Yang 160d9d4900
Merge pull request #6171 from ollama/mxyng/remove-temp
removeall to remove non-empty temp dirs
2024-08-09 15:47:13 -07:00
Nicholas Schwab d4e6407464 Restrict text files with explicit line feeds to *.go.
This partially reverts b732beba6a. It
seems like explicitly setting all files to use line feeds was done due
to issues with the go linter, hence it can be restricted to those files
(https://github.com/ollama/ollama/pull/6235#issuecomment-2278745953).
2024-08-09 23:14:13 +02:00
Daniel Hiltgen b7f7d8cd15
Merge pull request #6291 from dhiltgen/no_sparse_fail
Don't hard fail on sparse setup error
2024-08-09 12:30:25 -07:00