Commit graph

3289 commits

Author SHA1 Message Date
Daniel Hiltgen fc85f50a2b Ensure sparse files on windows during download
The file.Truncate call on windows will write the whole file
unless you set the sparse flag, leading to heavy I/O at the
beginning of download.  This should improve our
I/O behavior on windows and put less stress on the users disk.
2024-08-06 10:58:08 -07:00
royjhan 86b907f82a
sort batch results (#6189) 2024-08-05 16:55:34 -07:00
Michael Yang 10d49bce70
Merge pull request #6190 from ollama/mxyng/fix-integration
fix concurrency test
2024-08-05 16:45:49 -07:00
Michael Yang 7ed367419e fix concurrency test 2024-08-05 16:36:16 -07:00
Daniel Hiltgen 50ee8b5f56
Merge pull request #6186 from dhiltgen/numa
Implement linux NUMA detection
2024-08-05 15:20:06 -07:00
Michael Yang 03bdac0595
Merge pull request #6146 from ollama/mxyng/testing
use testing tempdirs
2024-08-05 13:00:05 -07:00
Daniel Hiltgen f457d63400 Implement linux NUMA detection
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
2024-08-05 12:56:20 -07:00
Michael Yang 39f2bc6bfc
Merge pull request #6167 from ollama/mxyng/line-feed
line feed
2024-08-05 00:06:28 -07:00
frob b73b0940ef
Disable paging for journalctl (#6154)
Users using `journalctl` to get logs for issue logging sometimes don't realize that paging is causing information to be missed.
2024-08-05 00:10:53 -04:00
Michael Yang 6a07344786 line feed 2024-08-04 17:25:41 -07:00
sryu1 8b920f35a4
Add Gemma 2 2b (#6151) 2024-08-04 10:58:39 -04:00
Ivan Charapanau 4221e39867
Reference ollama integration with Harbor (#6147) 2024-08-02 17:03:46 -07:00
Michael Yang a091fadfda use testing tempdirs 2024-08-02 16:04:06 -07:00
Michael Yang 77ccbf04dc
Merge pull request #6128 from ollama/mxyng/lint
enable gofmt/gofumpt/goimports/tenv
2024-08-02 14:58:40 -07:00
royjhan 4addf6b587
Update OpenAI Compatibility Docs with /v1/completions (#5311)
* Update docs

* token bug corrected

* Update docs/openai.md

* Update docs/openai.md

* add suffix

* merge conflicts

* merge conflicts
2024-08-02 13:16:23 -07:00
royjhan 85c7f11170
Update docs (#5310) 2024-08-02 13:05:57 -07:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Kim Hallberg ce1fb4447e
Fix models/{model} URL (#6132) 2024-08-01 16:31:47 -07:00
royjhan 558a54b098
Update OpenAI Compatibility Docs with /v1/embeddings (#5470)
* docs without usage

* no usage

* rm metric note
2024-08-01 16:00:29 -07:00
royjhan ed52833bb1
Add to docs (#5309) 2024-08-01 15:58:13 -07:00
royjhan 6f133a0bdd
OpenAI: Add Usage to v1/embeddings (#5886)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* add tokens to v1/embeddings

* separate usage
2024-08-01 15:49:37 -07:00
royjhan f561eecfb8
Update OpenAI Compatibility Docs with /v1/models (#5151)
* OpenAI Docs

* Update docs/openai.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Remove newline

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-01 15:48:44 -07:00
Michael Yang ff7c9060ec
Merge pull request #6115 from slouffka/fix-context
Fix context in /api/generate grows too much (#5980).
2024-08-01 15:13:59 -07:00
Michael Yang 0ff42e84b0
Merge pull request #4756 from ollama/mxyng/convert2
refactor convert
2024-08-01 14:16:30 -07:00
Vyacheslav Moskalev 8a9f946ca7 Refactor and format code. 2024-08-02 03:50:05 +07:00
Vyacheslav Moskalev 3b5210548e Refactor code. Remove extra variable. 2024-08-01 19:56:15 +07:00
Vyacheslav Moskalev b0c216584c Better types and naming closer to style. 2024-08-01 19:43:44 +07:00
Vyacheslav Moskalev 49a5483139 Change the order of context and prompt. 2024-08-01 19:25:56 +07:00
Vyacheslav Moskalev 6bc5c13758 Fix extra context concatenation in generate handler (#5980). 2024-08-01 15:45:58 +07:00
Michael Yang 3e614260af
Merge pull request #6109 from ollama/mxyng/fix-modelfile
fix modelfile message quotes
2024-07-31 17:05:43 -07:00
Michael Yang d87b4a488e fix modelfile message quotes 2024-07-31 16:52:09 -07:00
Michael Yang 4c14855ad7
Merge pull request #6106 from ollama/mxyng/default-sliding-window-attention
patches: phi3 optional sliding window attention
2024-07-31 16:12:06 -07:00
Blake Mizerany dc77bbcfa4
server: fix json marshalling of downloadBlobPart (#6108) 2024-07-31 16:01:24 -07:00
Michael Yang d8e2664c33 convert: fix parse functions 2024-07-31 15:58:55 -07:00
Michael Yang eafc607abb convert: only extract large files 2024-07-31 15:58:55 -07:00
Michael Yang 781fc2d576 Update convert/reader_safetensors.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-31 15:58:55 -07:00
Michael Yang df993fa37b comments 2024-07-31 15:58:55 -07:00
Michael Yang 5e9db9fb0b refactor convert 2024-07-31 15:58:33 -07:00
Michael Yang 0f3271db88 patches: phi3 default sliding window attention 2024-07-31 14:58:34 -07:00
Michael Yang 6b252918fb update convert test to check result data 2024-07-31 10:59:38 -07:00
Michael Yang c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang 5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
Daniel Nguyen 71399aa682
Added BoltAI as a desktop UI for Ollama (#6096) 2024-07-31 08:44:58 -07:00
Jeffrey Morgan 463a8aa273
Create SECURITY.md 2024-07-30 21:01:12 -07:00
Michael 3579b4966a
Update README to include Firebase Genkit (#6083)
Firebase Genkit
2024-07-30 18:40:09 -07:00
Jeffrey Morgan 5d66578356
Update README.md
Better example for multi-modal input
2024-07-30 18:08:34 -07:00
jmorganca afa8d6e9d5 patch gemma support 2024-07-30 18:07:29 -07:00
royjhan 1b44d873e7
Add Metrics to api\embed response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
Daniel Hiltgen cef2c6054d
Merge pull request #5859 from dhiltgen/homogeneous_gpus
Prevent partial loading on mixed GPU brands
2024-07-30 11:06:42 -07:00
Daniel Hiltgen 345420998e Prevent partial loading on mixed GPU brands
In mult-brand GPU setups, if we couldn't fully load the model we
would fall through the scheduler and mistakenly try to load across
a mix of brands.  This makes sure we find the set of GPU(s) that
best fit for the partial load.
2024-07-30 11:00:55 -07:00