Commit graph

11 commits

Author SHA1 Message Date
Michael Yang 7ed367419e fix concurrency test 2024-08-05 16:36:16 -07:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Daniel Hiltgen 68dfc6236a refined test timing
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen 074dc3b9d8 Integration fixes 2024-05-10 14:20:10 -07:00
Michael Yang a7248f6ea8 update tests 2024-05-06 15:24:01 -07:00
Daniel Hiltgen 34b9db5afc Request and model concurrency
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Daniel Hiltgen 4fec5816d6 Integration test improvements
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Patrick Devine 1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Daniel Hiltgen 7b6cbc10ec Integration tests conditionally pull
If images aren't present, pull them.
Also fixes the expected responses
2024-03-25 08:57:45 -07:00
Daniel Hiltgen 949b6c01e0 Revamp go based integration tests
This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.
2024-03-23 14:24:18 +01:00