![]() This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM. |
||
---|---|---|
.. | ||
auth.go | ||
download.go | ||
fixblobs.go | ||
fixblobs_test.go | ||
images.go | ||
layer.go | ||
manifest.go | ||
manifest_test.go | ||
model.go | ||
modelpath.go | ||
modelpath_test.go | ||
prompt.go | ||
prompt_test.go | ||
routes.go | ||
routes_create_test.go | ||
routes_delete_test.go | ||
routes_list_test.go | ||
routes_test.go | ||
sched.go | ||
sched_test.go | ||
upload.go |